[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-dev] [RFC PATCH v2] virtio-blk: add zoned block device specification
On Sun, Jun 12, 2022 at 01:58:36AM +0000, Dmitry Fomichev wrote: > On Sat, 2022-06-11 at 21:09 +0100, Stefan Hajnoczi wrote: > > On Thu, Jun 09, 2022 at 08:43:21PM -0400, Dmitry Fomichev wrote: > > > +Host-managed zoned block devices have their LBA range divided to Sequential > > > +Write Required (SWR) zones that require some additional handling from the > > > host > > > +for sustainable operation. All write requests to SWR zones must be > > > sequential > > > +and the zones with some data need to be reset before that data can be > > > rewritten. > > > +Host-managed devices support a set of ZBD-specific I/O requests that can be > > > used > > > +by the host to manage device zones. Host-managed devices report > > > VIRTIO_BLK_Z_HM > > > +value in the \field{model} field in \field{zoned}. > > > > One of > > "report the VIRTIO_BLK_Z_HM value in ..." > > "report a VIRTIO_BLK_Z_HM value in ..." > > "report VIRTIO_BLK_Z_HM in ..." > > reads more naturally. > > OK. > > > > > > + > > > +Host-aware zoned block devices have their LBA range divided to Sequential > > > +Write Preferred (SWP) zones that support the random write access, similar to > > > +regular non-zoned devices. However, the device I/O performance might not be > > > +optimal if SWP zones are used in a random I/O pattern. SWP zones also > > > support > > > +the same set of ZBD-specific I/O requests as host-managed devices that allow > > > +host-aware devices to be managed by any host that supports zoned block > > > devices > > > +to achieve its optimum performance. Host-aware devices report > > > VIRTIO_BLK_Z_HA > > > +value in the \field{model} field in \field{zoned}. > > > > Same as above. > > Sure. > > > > > > + > > > +During device operation, SWR and SWP zones can be in one of the following > > > states: > > > +empty, implicitly-open, explicitly-open, closed and full. The state machine > > > that > > > +governs the transitions between these states is described later in this > > > document. > > > + > > > +SWR and SWP zones consume volatile device resources while being in certain > > > +states and the device may set limits on the number of zones that can be in > > > these > > > +states simultaneously. > > > + > > > +Zoned block devices use two internal counters to account for the device > > > +resources in use, the number of currently open zones and the number of > > > currently > > > +active zones. > > > + > > > +Any zone state transition from a state that doesn't consume a zone resource > > > to a > > > +state that consumes the same resource increments the internal device counter > > > for > > > +that resource. Any zone transition out of a state that consumes a zone > > > resource > > > +to a state that doesn't consume the same resource decrements the counter. > > > Any > > > +request that causes the device to exceed the reported zone resource limits > > > is > > > +terminated by the device with a "zone resources exceeded" error as defined > > > for > > > +specific commands later. > > > + > > > \begin{lstlisting} > > > struct virtio_blk_config { > > > le64 capacity; > > > @@ -4623,6 +4694,15 @@ \subsection{Device configuration > > > layout}\label{sec:Device Types / Block Device / > > > le32 max_secure_erase_sectors; > > > le32 max_secure_erase_seg; > > > le32 secure_erase_sector_alignment; > > > + struct virtio_blk_zoned_characteristics { > > > + le32 zone_sectors; > > > + le32 max_open_zones; > > > + le32 max_active_zones; > > > + le32 max_append_sectors; > > > + le32 write_granularity; > > > + u8 model; > > > + u8 unused2[3]; > > > + } zoned; > > > }; > > > \end{lstlisting} > > > > > > @@ -4686,6 +4766,10 @@ \subsection{Device Initialization}\label{sec:Device > > > Types / Block Device / Devic > > > \field{secure_erase_sector_alignment} can be used by OS when splitting a > > > request based on alignment. > > > > > > +\item If the VIRTIO_BLK_F_ZONED feature is negotiated, the fields in > > > + \field{zoned} can be read by the driver to determine the zone > > > + characteristics of the device. All \field{zoned} fields are read-only. > > > + > > > \end{enumerate} > > > > > > \drivernormative{\subsubsection}{Device Initialization}{Device Types / Block > > > Device / Device Initialization} > > > @@ -4701,6 +4785,33 @@ \subsection{Device Initialization}\label{sec:Device > > > Types / Block Device / Devic > > > The driver MUST NOT read \field{writeback} before setting > > > the FEATURES_OK \field{device status} bit. > > > > > > +Drivers SHOULD NOT negotiate VIRTIO_BLK_F_ZONED feature if they are > > > incapable > > > +of supporting devices with the VIRTIO_BLK_Z_HM or VIRTIO_BLK_Z_HA zoned > > > model. > > > + > > > +Drivers MAY operate with VIRTIO_BLK_F_ZONED feature negotiated when the > > > device > > > +reports VIRTIO_BLK_Z_NONE zoned model for testing and development. > > > > Specifying a particular use case ("testing and development") could be > > interpreted to imply that other use cases MAY NOT operate with > > VIRTIO_BLK_F_ZONED. > > > > I suggest dropping the use case from the sentence or moving that sub-point > > outside the normative section to where the text describes VIRTIO_BLK_Z_NONE. > > Something like "Devices that offer VIRTIO_BLK_F_ZONED with VIRTIO_BLK_Z_NONE > > commonly do so for testing and development purposes". > > Good point, will do. > > > > > > +Each request is of form: > > > + > > > +\begin{lstlisting} > > > +struct virtio_blk_zoned_req { > > > + le32 type; > > > + le32 reserved; > > > + le64 sector; > > > + struct { > > > + /* ALL zone operation flag */ > > > + le32 mgmt_send_all:1; > > > > Bit-fields cannot be used in external interfaces (like hardware > > interfaces) because the C standard says (N1570 6.7.2.1 11): > > > > The order of allocation of bit-fields within a unit (high-order to > > low-order or low-order to high-order) is implementation-defined. > > > > So this C syntax does not define a specific binary representation, it's > > up to the compiler or system ABI. > > > > Right... I was confused by the fact that virtio_blk_discard_write_zeroes struct > in the same spec happily uses bit-fields for the unmap flag. Perhaps, that > flag could be also defined using a bit mask... :) Ah, good point. I was wrong. See 1.4 Structure Specifications in the VIRTIO spec: Bit-fields within integer fields are always listed in order, from the least significant to the most significant bit. The bit-fields are considered unsigned integers of the specified width with the next in significance relationship of the bits preserved. So it is okay to use bit-fields in the spec. It pushes the problem onto implementors of the spec. They cannot bit-fields in implementations where the C compiler uses a representation that differs from the VIRTIO spec :). > > > Please use a le32 field and define a bit number: > > > > #define VIRTIO_BLK_ZONE_MGMT_F_SEND_ALL (1u << 0) > > le32 flags; > > > > > + le32 reserved:31; > > > + } zone; > > > + u8 data[]; > > > + le64 append_sector; > > > + u8 reserved1[3]; > > > + u8 status; > > > +}; > > > +\end{lstlisting} > > > > Yep, the ends up to be pretty simple. Using a le32 should be sufficient for > future command flag additions like the partial flag, etc. > > > Now that the status field is at the end of the struct the zoned commands > > can be treated like any other struct virtio_blk_req. > > > > The contents of the data[] field for ZONE_OPEN, ZONE_CLOSE, ZONE_FINISH, > > and ZONE_RESET requests is: > > > > #define VIRTIO_BLK_ZONE_MGMT_F_SEND_ALL (1u << 0) > > struct virtio_blk_zoned_mgmt_send { > > le32 flags; > > }; > > > > The contents of the data[] field for ZONE_APPEND is: > > > > u8 payload[]; /* driver -> device */ > > > > followed by: > > > > le64 append_sector; /* device -> driver */ > > > > The contents of the data[] field for ZONE_REPORT is struct > > virtio_blk_zone_report. > > > > The reason I'm pushing for getting rid of struct virtio_blk_zoned_req is > > that the zbd-specific fields are not common to all ZONE_* requests. I > > don't see a reason to define a common request layout anymore. > > > > Also, it would make the zoned command design similar to struct > > virtio_blk_discard_write_zeroes where request metadata is a separate > > struct that is located in struct virtio_blk_req's data[] field instead > > of a new top-level struct virtio_blk_discard_write_zeroes_request that > > duplicates struct virtio_blk_req fields. > > My concern is that this approach requires quite a few of runtime checks to be > added in the hot path - if (alba), if (flags), request opcode checks... My PoC > code that uses virtio_blk_zoned_req adds zero checks, both for zoned and non- > zoned operation. > > With discard, there is no other way to handle it except to introduce > virtio_blk_discard_write_zeroes because the number of incoming discard segments > is variable, but for zoned operation it should be sufficient to use a simpler > solution with the virtio_blk_zoned_req which seems to be less fragile in terms of > implementation. Can you describe where new runtime checks are required? The content of data[] is specific to each request type. Driver and device implementations already have conditional branches for handling different request types. The data[] layout code goes into those branches and it should therefore not require new runtime checks. In other words, the ZONE_APPEND code knows statically at compile-time that the append_sectors field is needed. So I think any runtime checks that may exist in driver code right now are accidental and can be avoided. Stefan
Attachment:
signature.asc
Description: PGP signature
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]