OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] [RFC PATCH v2] virtio-blk: add zoned block device specification


On Thu, Jun 09, 2022 at 08:43:21PM -0400, Dmitry Fomichev wrote:
> +Host-managed zoned block devices have their LBA range divided to Sequential
> +Write Required (SWR) zones that require some additional handling from the host
> +for sustainable operation. All write requests to SWR zones must be sequential
> +and the zones with some data need to be reset before that data can be rewritten.
> +Host-managed devices support a set of ZBD-specific I/O requests that can be used
> +by the host to manage device zones. Host-managed devices report VIRTIO_BLK_Z_HM
> +value in the \field{model} field in \field{zoned}.

One of
"report the VIRTIO_BLK_Z_HM value in ..."
"report a VIRTIO_BLK_Z_HM value in ..."
"report VIRTIO_BLK_Z_HM in ..."
reads more naturally.

> +
> +Host-aware zoned block devices have their LBA range divided to Sequential
> +Write Preferred (SWP) zones that support the random write access, similar to
> +regular non-zoned devices. However, the device I/O performance might not be
> +optimal if SWP zones are used in a random I/O pattern. SWP zones also support
> +the same set of ZBD-specific I/O requests as host-managed devices that allow
> +host-aware devices to be managed by any host that supports zoned block devices
> +to achieve its optimum performance. Host-aware devices report VIRTIO_BLK_Z_HA
> +value in the \field{model} field in \field{zoned}.

Same as above.

> +
> +During device operation, SWR and SWP zones can be in one of the following states:
> +empty, implicitly-open, explicitly-open, closed and full. The state machine that
> +governs the transitions between these states is described later in this document.
> +
> +SWR and SWP zones consume volatile device resources while being in certain
> +states and the device may set limits on the number of zones that can be in these
> +states simultaneously.
> +
> +Zoned block devices use two internal counters to account for the device
> +resources in use, the number of currently open zones and the number of currently
> +active zones.
> +
> +Any zone state transition from a state that doesn't consume a zone resource to a
> +state that consumes the same resource increments the internal device counter for
> +that resource. Any zone transition out of a state that consumes a zone resource
> +to a state that doesn't consume the same resource decrements the counter. Any
> +request that causes the device to exceed the reported zone resource limits is
> +terminated by the device with a "zone resources exceeded" error as defined for
> +specific commands later.
> +
>  \begin{lstlisting}
>  struct virtio_blk_config {
>          le64 capacity;
> @@ -4623,6 +4694,15 @@ \subsection{Device configuration layout}\label{sec:Device Types / Block Device /
>          le32 max_secure_erase_sectors;
>          le32 max_secure_erase_seg;
>          le32 secure_erase_sector_alignment;
> +        struct virtio_blk_zoned_characteristics {
> +                le32 zone_sectors;
> +                le32 max_open_zones;
> +                le32 max_active_zones;
> +                le32 max_append_sectors;
> +                le32 write_granularity;
> +                u8 model;
> +                u8 unused2[3];
> +        } zoned;
>  };
>  \end{lstlisting}
>  
> @@ -4686,6 +4766,10 @@ \subsection{Device Initialization}\label{sec:Device Types / Block Device / Devic
>      \field{secure_erase_sector_alignment} can be used by OS when splitting a
>      request based on alignment.
>  
> +\item If the VIRTIO_BLK_F_ZONED feature is negotiated, the fields in
> +    \field{zoned} can be read by the driver to determine the zone
> +    characteristics of the device. All \field{zoned} fields are read-only.
> +
>  \end{enumerate}
>  
>  \drivernormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization}
> @@ -4701,6 +4785,33 @@ \subsection{Device Initialization}\label{sec:Device Types / Block Device / Devic
>  The driver MUST NOT read \field{writeback} before setting
>  the FEATURES_OK \field{device status} bit.
>  
> +Drivers SHOULD NOT negotiate VIRTIO_BLK_F_ZONED feature if they are incapable
> +of supporting devices with the VIRTIO_BLK_Z_HM or VIRTIO_BLK_Z_HA zoned model.
> +
> +Drivers MAY operate with VIRTIO_BLK_F_ZONED feature negotiated when the device
> +reports VIRTIO_BLK_Z_NONE zoned model for testing and development.

Specifying a particular use case ("testing and development") could be
interpreted to imply that other use cases MAY NOT operate with
VIRTIO_BLK_F_ZONED.

I suggest dropping the use case from the sentence or moving that sub-point
outside the normative section to where the text describes VIRTIO_BLK_Z_NONE.
Something like "Devices that offer VIRTIO_BLK_F_ZONED with VIRTIO_BLK_Z_NONE
commonly do so for testing and development purposes".

> +Each request is of form:
> +
> +\begin{lstlisting}
> +struct virtio_blk_zoned_req {
> +        le32 type;
> +        le32 reserved;
> +        le64 sector;
> +        struct {
> +                /* ALL zone operation flag */
> +                le32 mgmt_send_all:1;

Bit-fields cannot be used in external interfaces (like hardware
interfaces) because the C standard says (N1570 6.7.2.1 11):

  The order of allocation of bit-fields within a unit (high-order to
  low-order or low-order to high-order) is implementation-defined.

So this C syntax does not define a specific binary representation, it's
up to the compiler or system ABI.

Please use a le32 field and define a bit number:

  #define VIRTIO_BLK_ZONE_MGMT_F_SEND_ALL (1u << 0)
  le32 flags;

> +                le32 reserved:31;
> +        } zone;
> +        u8 data[];
> +        le64 append_sector;
> +        u8 reserved1[3];
> +        u8 status;
> +};
> +\end{lstlisting}

Now that the status field is at the end of the struct the zoned commands
can be treated like any other struct virtio_blk_req.

The contents of the data[] field for ZONE_OPEN, ZONE_CLOSE, ZONE_FINISH,
and ZONE_RESET requests is:

  #define VIRTIO_BLK_ZONE_MGMT_F_SEND_ALL (1u << 0)
  struct virtio_blk_zoned_mgmt_send {
      le32 flags;
  };

The contents of the data[] field for ZONE_APPEND is:

  u8 payload[];            /* driver -> device */

followed by:

  le64 append_sector;      /* device -> driver */

The contents of the data[] field for ZONE_REPORT is struct
virtio_blk_zone_report.

The reason I'm pushing for getting rid of struct virtio_blk_zoned_req is
that the zbd-specific fields are not common to all ZONE_* requests. I
don't see a reason to define a common request layout anymore.

Also, it would make the zoned command design similar to struct
virtio_blk_discard_write_zeroes where request metadata is a separate
struct that is located in struct virtio_blk_req's data[] field instead
of a new top-level struct virtio_blk_discard_write_zeroes_request that
duplicates struct virtio_blk_req fields.

Stefan

Attachment: signature.asc
Description: PGP signature



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]