OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [RFC] virtio-blk: add discard and write zeroes features to specification


On 26/02/2018 09:16, Changpeng Liu wrote:
> Existing virtio-blk protocol doesn't have DISCARD/WRITE ZEROES support,
> this will impact the performance when using SSD backend over file systems.
> 
> Here is the proposal to extend existing virtio-blk protocol to support
> DISCARD/WRITE ZEROES commands.
> 
> Basic idea here is using 16 Bytes payload to support 1 descriptor, users
> can put several segments together with 1 DISCARD command.
> 
> struct virtio_blk_discard_write_zeroes {
>        le64 slba;
>        le32 nlba;
>        struct {
>                le32 unmap:1;
>                le32 reserved:31;
>        } flags;
> };
> 
> For the purpose to support such feature, we need to introduce 2 new feature
> flags: VIRTIO_BLK_F_DISCARD/VIRTIO_BLK_F_WRITE_ZEROES, and 2 new command
> types: VIRTIO_BLK_T_DISCARD/VIRTIO_BLK_T_WRITE_ZEROES. Also we introduce
> 3 new parameters in the configuration space of virtio-blk: max_discard_sectors/
> max_discard_seg/max_write_zeroes_sectors. These parameters will tell the
> OS what's the granularity when issuing such commands.
> 
> If both DISCARD and WRITE ZEROES are supported, unmap flag bit maybe used
> for WRITE ZEROES command with DISCARD bit enabled.
> 
> Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
> ---
>  content.tex | 41 ++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/content.tex b/content.tex
> index c7ef7fd..5555b18 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -4127,6 +4127,13 @@ device except where noted.
>  
>  \item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between writeback
>      and writethrough modes.
> +
> +\item[VIRTIO_BLK_F_DISCARD (13)] Device can support discard command, maximum
> +    discard sectors size in \field{max_discard_sectors} and maximum discard
> +    segment number in field{max_discard_seg}.
> +
> +\item[VIRTIO_BLK_F_WRITE_ZEROES (14)] Device can support write zeroes command,
> +	maximum write zeroes sectors size in \field{max_write_zeroes_sectors}.
>  \end{description}
>  
>  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Block Device / Feature bits / Legacy Interface: Feature bits}
> @@ -4170,6 +4177,9 @@ struct virtio_blk_config {
>                  le32 opt_io_size;
>          } topology;
>          u8 writeback;

Let's add three padding bytes here, for clarity.  Please add a note that
these three bytes MUST be zero in the "Device Requirements: Device
Initialization" section.

> +        le32 max_discard_sectors;
> +        le32 max_discard_seg;

Perhaps add here:

	le32 discard_sector_alignment;

(and document it below)?

> +        le32 max_write_zeroes_sectors;

Perhaps we can put a bit here saying whether write_zeroes will ever look
at the \field{unmap} bit?  Like:

	u8   write_zeroes_may_unmap;

>  };
>  \end{lstlisting}
>  
> @@ -4211,6 +4221,15 @@ according to the native endian of the guest rather than
>    after reset can be either writeback or writethrough.  The actual
>    mode can be determined by reading \field{writeback} after feature
>    negotiation.
> +
> +\item If the VIRTIO_BLK_F_DISCARD feature is negotiated,
> +    \field{max_discard_sectors} and \field{max_discard_seg} can be read
> +    to determine the maximum discard sectors and maximum number of discard
> +    segments for the block driver to use.
> +
> +\item if the VIRTIO_BLK_F_WRITE_ZEROES feature is negotiated,
> +    \field{max_write_zeroes_sectors} can be read to determine the maximum
> +    write zeroes sectors for the block driver to use.

So only one struct virtio_blk_discard_write_zeroes is possible?

>  \end{enumerate}
>  
>  \drivernormative{\subsubsection}{Device Initialization}{Device Types / Block Device / Device Initialization}
> @@ -4270,21 +4289,41 @@ struct virtio_blk_req {
>          u8 data[][512];
>          u8 status;
>  };
> +
> +struct virtio_blk_discard_write_zeroes {
> +       le64 slba;
> +       le32 nlba;

	le64 sector;
	le32 num_sectors;

please (more consistent with the rest).

> +       struct {
> +               le32 unmap:1;
> +               le32 reserved:31;
> +       } flags;
> +};
>  \end{lstlisting}
>  
>  The type of the request is either a read (VIRTIO_BLK_T_IN), a write
> -(VIRTIO_BLK_T_OUT), or a flush (VIRTIO_BLK_T_FLUSH).
> +(VIRTIO_BLK_T_OUT), a discard (VIRTIO_BLK_T_DISCARD), a write zeroes
> +(VIRTIO_BLK_T_WRITE_ZEROES) or a flush (VIRTIO_BLK_T_FLUSH).
>  
>  \begin{lstlisting}
>  #define VIRTIO_BLK_T_IN           0
>  #define VIRTIO_BLK_T_OUT          1
>  #define VIRTIO_BLK_T_FLUSH        4
> +#define VIRTIO_BLK_T_DISCARD      16
> +#define VIRTIO_BLK_T_WRITE_ZEROES 32
>  \end{lstlisting}

Bit 0 defines the command direction, which should be 1, while the
command number is in bits 1-30.  Since the highest-numbered command is
currently 8, we can use 11 and 13 respectively for discard and write zeroes.

>  The \field{sector} number indicates the offset (multiplied by 512) where
>  the read or write is to occur. This field is unused and set to 0
>  for scsi packet commands and for flush commands.
>  
> +The \field{data} used for discard or write zeroes command can be
> +described by structure virtio_blk_discard_write_zeroes, while here,
> +\field{sector} is not used for the two commands. \field{slba} indicates
> +the starting sector of the segment, \field{nlba} indicates the number
> +of sectors for discard/write zeroes commands, \field{unmap} only used
> +for write zeroes command, which means write zeroes to specified sectors
> +while keep these sectors discarded.

-----
The \field{sector} number indicates the offset (multiplied by 512) where
the read or write is to occur. This field is unused and set to 0
commands other than read or write.

The \field{data} used for discard or write zeroes command is
described by one or more virtio_blk_discard_write_zeroes structs.
\field{sector} indicates the starting sector of the segment, while
\field{nlba} indicates the number of sectors in each discarded range.
\field{unmap} is only used for write zeroes command.
-----

Please add the following to "Device Requirements: Device Operation":

-----
The device MUST set the \emph{status} byte to VIRTIO_BLK_S_UNSUPP for
discard and write zeroes commands if any unknown flag is set.
Furthermore, the device MUST set the \emph{status} byte to
VIRTIO_BLK_S_UNSUPP for discard commands if the \field{unmap} flag is set.

For discard commands, the device MAY deallocate the specified range of
sectors in the device backend storage.

For write zeroes commands, if the \field{unmap} is set, the device MAY
deallocate the specified range of sectors in the device backend storage,
as if the DISCARD command had been sent.  After a write zeroes command
is completed, reads of the specified ranges of sectors MUST return
zeroes.  This is true independent of whether \field{unmap} was set or clear.
-----

Please add this to "Device Requirements: Driver Operation", too:

-----
The \field{unmap} bit MUST be zero for discard commands.  The driver
MUST NOT assume anything about the data returned by read requests after
a range of sectors has been discarded.
-----

If we added the "write_zeroes_may_unmap" bit, something like this would
go in "Device Requirements: Device Operation":

----
The device SHOULD clear the \field{write_zeroes_may_unmap} field of the
virtio configuration space if and only if a write zeroes request cannot
result in deallocating one or more sectors.  The device MAY change the
content of the field during operation of the device; when this happens,
the device SHOULD trigger a configuration change interrupt.
-----

Thanks,

Paolo

>  The final \field{status} byte is written by the device: either
>  VIRTIO_BLK_S_OK for success, VIRTIO_BLK_S_IOERR for device or driver
>  error or VIRTIO_BLK_S_UNSUPP for a request unsupported by device:
> 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]