OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] [RFC PATCH v3] virtio-blk: add zoned block device specification


On Sat, 2022-08-06 at 17:20 -0400, Stefan Hajnoczi wrote:
> On Thu, 4 Aug 2022 at 18:41, Dmitry Fomichev <dmitry.fomichev@wdc.com> wrote:
> 
> Hi Dmitry,
> I think RFC can be removed from the Subject line for the next revision
> of this patch.
> 
> For instructions on how to bring this to a Technical Committee vote, see:
> https://github.com/oasis-tcs/virtio-spec#use-of-github-issues
> 
> > 
> > Introduce support for Zoned Block Devices to virtio.
> > 
> > Zoned Block Devices (ZBDs) aim to achieve a better capacity, latency
> > and/or cost characteristics compared to commonly available block
> > devices by getting the entire LBA space of the device divided to block
> > regions that are much larger than the LBA size. These regions are
> > called zones and they can only be written sequentially. More details
> > about ZBDs can be found at
> > 
> > https://zonedstorage.io/docs/introduction/zoned-storageÂ;.
> > 
> > In its current form, the virtio protocol for block devices (virtio-blk)
> > is not aware of ZBDs but it allows the guest to successfully scan a
> > host-managed drive provided by the host. As the result, the
> > host-managed drive appears at the guest as a regular drive that will
> > operate erroneously under the most common write workloads.
> > 
> > To fix this, the virtio-blk protocol needs to be extended to add the
> > capabilities to convey the zone characteristics of host ZBDs to the
> > guest and to provide the support for ZBD-specific commands - Report
> > Zones, four zone operations and (optionally) Zone Append. The proposed
> > standard extension aims to provide this functionality.
> > 
> > This patch extends the virtio-blk section of virtio specification with
> > the minimum set of requirements that are necessary to support ZBDs.
> > The resulting device model is a subset of the models defined in ZAC/ZBC
> > and ZNS standards documents. The included functionality mirrors
> > the existing Linux kernel block layer ZBD support and should be
> > sufficient to handle the host-managed and host-aware HDDs that are on
> > the market today as well as ZNS SSDs that are entering the market at
> > the moment of this patch submission.
> > 
> > I have developed a proof of concept patch series that adds ZBD support
> > to virtio-blk Linux kernel driver by implementing the protocol
> > extensions defined in the spec patch. I would like to receive feedback
> > on this specification patch before posting that series to the block
> > LKML.
> > 
> > I would like to thank the following people for their useful feedback
> > and suggestions while working on the initial iterations of this patch.
> > 
> > Damien Le Moal <damien.lemoal@opensource.wdc.com>
> > Matias BjÃrling <Matias.Bjorling@wdc.com>
> > Niklas Cassel <Niklas.Cassel@wdc.com>
> > Hans Holmberg <Hans.Holmberg@wdc.com>
> > 
> 

<snip>

> 
> > +VIRTIO_BLK_T_ZONE_OPEN, VIRTIO_BLK_T_ZONE_CLOSE, VIRTIO_BLK_T_ZONE_FINISH
> > and
> > +VIRTIO_BLK_T_ZONE_RESET requests make the zone operation to act on a
> > particular
> > +zone specified by the zone sector address in the \field{sector} of the
> > request.
> > +The zone sector address is a 64-bit value expressed in 512-byte units that
> > +points at the first sector in the target zone.
> 
> Normally the struct virtio_blk_req sector field is expressed in bytes.
> If I'm interpreting this text correctly the zone management commands
> change the field's meaning to sector units?
> 

These should be in bytes, I'll make sure it is clear.

> > +
> > +VIRTIO_BLK_T_ZONE_RESET_ALL request acts upon all applicable zones of the
> > +device. For this request, the driver MUST set the field \field{sector} to
> > zero
> > +value.
> > +
> > +The \field{sector} field of the VIRTIO_BLK_T_ZONE_APPEND request MUST
> > specify
> > +the first sector of the zone to which data is to be appended at the position
> > of
> > +the write pointer. The zone sector address is a 64-bit value expressed in
> > +512-byte units that points anywhere in the target zone. The size of the data
> 
> This "512-byte units" wording confuses me. I think the units should be
> bytes and the text should say "in multiples of 512 bytes".

The wording about "512-byte units" is actually present in the spec in the part
that is related to discard. I do agree that it would be better to stick with the
way the units are described for read/write requests, will reword.

> 
> > +that is appended MUST NOT exceed the \field{max_append_sectors} value
> > provided
> > +by the device in \field{virtio_blk_zoned_characteristics} configuration
> > space
> > +structure.
> > +
> > +Upon a successful completion of a VIRTIO_BLK_T_ZONE_APPEND request, the
> > driver
> > +SHOULD read the starting sector location of the written data from the
> > request field
> > +\field{append_sector}.
> 
> Maybe that depends on the application? If the application blindly
> appends and doesn't use the write pointer then there's no need to read
> append_sector. I think this statement can be removed and instead the
> non-normative section can describe the purpose of the append_sector
> field so that driver authors know the new write pointer value is
> available if they require it.

Yes, writing a series of fixed-size records in no particular order using Zone
Append is one of the use cases where reading of the append sector becomes
unnecessary. I think replacing SHOULD with MAY in this paragraph will suffice to
make it clear that reading the append_sector field is not always required.

> 
> > +
> > +All VIRTIO_BLK_T_OUT requests issued by the driver to sequential zones and
> > +VIRTIO_BLK_T_ZONE_APPEND requests MUST have:
> > +
> > +\begin{enumerate}
> > +\item the data size that is a multiple of the number of bytes reported
> > +ÂÂÂ by the device in the field \field{write_granularity} in the
> > +ÂÂÂ \field{virtio_blk_zoned_characteristics} configuration space structure.
> > 

<snip>

> > +
> > +\begin{enumerate}
> > +\item Fill the unwritten data with a device-specific byte pattern. The
> > +configuration, control and reporting of this byte pattern is beyond the
> > scope
> > +of this standard. This is the preferred approach.
> > +
> > +\item Fail the request. Depending on the driver implementation, this may
> > prevent
> > +the device from becoming operational.
> > +
> > +\item Return stale, previously written data to the driver. This approach is
> > the
> > +least preferred for its obvious negative security implications.
> > +\end{enumerate}
> 
> Do these semantics come from the ZBD model? They seem kind of
> undesirable to me (filling with zeroes seems most sensible), but
> virtio-blk devices must be able to pass through real ZBD devices, so
> it makes sense to support same semantics as the ZBD model.

In ZBDs, all sequential zone reads return a substitute pattern for unwritten data
if such a read is successful, BUT if the device contains some conventional zones,
the usual provisioning rules apply for these zones and that may include reading
of formerly written data. Since this section specifically talks about the data
that is read above the write pointer, we can remove the third option from this
list.

> 
> > +
> > +If the both VIRTIO_BLK_F_ZONED and VIRTIO_BLK_F_SECURE_ERASE features are
> > +negotiated, then
> > +
> > +\begin{enumerate}
> > +\item the field \field{secure_erase_sector_alignment} in the configuration
> > space
> > +of the device MUST be a multiple of \field{zone_sectors} value reported in
> > the
> > +device configuration space.
> > +
> > +\item the data size in VIRTIO_BLK_T_SECURE_ERASE requests MUST be a multiple
> > of
> > +\field{zone_sectors} value in the device configuration space.
> > +\end{enumerate}
> > +
> > +The device MUST handle a VIRTIO_BLK_T_SECURE_ERASE request in the same way
> > it
> > +handles VIRTIO_BLK_T_ZONE_RESET request for the zone range specified in the
> > +VIRTIO_BLK_T_SECURE_ERASE request.
> > +
> > Â\subsubsection{Legacy Interface: Device Operation}\label{sec:Device Types /
> > Block Device / Device Operation / Legacy Interface: Device Operation}
> > ÂWhen using the legacy interface, transitional devices and drivers
> > ÂMUST format the fields in struct virtio_blk_req
> > --
> > 2.34.1
> > 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]