OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] [RFC PATCH] virtio-blk: add zoned block device specification


On Sat, Jun 04, 2022 at 12:09:46AM +0000, Dmitry Fomichev wrote:
> On Fri, 2022-06-03 at 09:08 +0100, Stefan Hajnoczi wrote:
> > (Resent from my subscriber email.)
> > 
> > On Fri, 3 Jun 2022 at 03:30, Dmitry Fomichev <Dmitry.Fomichev@wdc.com> wrote:
> > > 
> > > On Thu, 2022-06-02 at 08:23 +0100, Stefan Hajnoczi wrote:
> > > > (Resending from my mailing list subscriber address so this makes it onto
> > > > the list.)
> > > > 
> > > > On Thu, 2 Jun 2022 at 00:56, Dmitry Fomichev <dmitry.fomichev@wdc.com>
> > > > wrote:
> > > > Here is the non-union representation of the structs. Note that this
> > > > means existing request processing code can handle IN/OUT/FLUSH even
> > > > when VIRTIO_BLK_F_ZONED is negotiated. We don't need two different
> > > > code paths for these common commands:
> > > > 
> > > > /* IN/OUT/FLUSH/DISCARD/WRITE_ZEROES */
> > > > struct virtio_blk_req {
> > > >         le32 type;
> > > >         le32 reserved;
> > > >         le64 sector;
> > > >         u8 data[];
> > > >         u8 status;
> > > > };
> > > > 
> > > > /* ZONE_APPEND */
> > > > struct virtio_blk_req_zone_append {
> > > >         le32 type;
> > > >         le32 reserved;
> > > >         le64 sector;
> > > >         u8 data[];
> > > >         le64 zone_result;
> > > >         u8 status;
> > > > };
> > > > 
> > > > /* ZONE_OPEN, ZONE_CLOSE, ZONE_FINISH, ZONE_RESET */
> > > > struct virtio_blk_req_zone_mgmt_send {
> > > >         le32 type;
> > > >         le32 reserved;
> > > >         le64 sector;
> > > >         /* ALL zone operation flag */
> > > >         __u8 all;
> > > >         __u8 unused1[3];
> > > >         u8 data[];
> > > >         u8 status;
> > > > };
> > > > 
> > > > /* ZONE_REPORT */
> > > > struct virtio_blk_req_zone_mgmt_receive {
> > > >         le32 type;
> > > >         le32 reserved;
> > > >         le64 sector;
> > > >         /* Partial zone report flag */
> > > >         __u8 partial;
> > > >         __u8 unused2[3];
> > > >         u8 data[];
> > > >         u8 status;
> > > > };
> > > 
> > > We felt that it is less confusing to have a single ZBD request structure so
> > > there
> > > is 1:1 relationship between the VIRTIO_BLK_F_ZONED bit value and the choice
> > > of
> > > the request -
> > > 
> > > VIRTIO_BLK_F_ZONED off ->  virtio_blk_req
> > > VIRTIO_BLK_F_ZONED on ->  virtio_blk_zoned_req
> > > 
> > > We also want to make sure that zone management/send/receive and append
> > > requests
> > > are always the same size and it should be easier with a single request
> > > structure.
> > > 
> > > It is possible to use a simpler, but less structured union here -
> > > 
> > > struct virtio_blk_zoned_req {
> > >         le32 type;
> > >         le32 reserved;
> > >         le64 sector;
> > >         union zoned_params {
> > >                 /* ALL zone operation flag */
> > >                 __u8 all;
> > >                 /* Partial zone report flag */
> > >                 __u8 partial;
> > >                 __u32 rsvd1;
> > >         } zone;
> > >         u8 data[];
> > >         le64 zone_result;
> > >         u8 reserved1[3];
> > >         u8 status;
> > > }
> > > 
> > > Using the single code path is very important and the PoC Linux driver code
> > > that I
> > > have sketched up achieves that. Since the request header is identical for the
> > > both cases, I hope that it can be done in QEMU too.
> > 
> > There are two possibilities for optimizing the request layout:
> > 1. For drivers and devices that implement both !VIRTIO_BLK_F_ZONED and
> > VIRTIO_BLK_F_ZONED. Here it's best to share the same request layout
> > for commands available in both modes. In other words IN and OUT
> > requests to zoned devices should use struct virtio_blk_req so code can
> > be shared and only commands unique to zoned devices use struct
> > virtio_blk_zoned_req.
> > 2. For drivers and devices that implement only one of
> > !VIRTIO_BLK_F_ZONED and VIRTIO_BLK_F_ZONED. Here consistent request
> > layout is best so the code parses all commands in the same way. In
> > other words, IN and OUT requests to zoned devices use struct
> > virtio_blk_zoned_req so that the implementation doesn't have to bother
> > with struct virtio_blk_req at all.
> > 
> > I think you chose #2 and I gravitated towards #1. Which one is optimal
> > depends on the use case. Feel free to stick with #2.
> > 
> 
> OK. We certainly considered #1, but we found that in Linux virtio-blk driver
> implementation, the request size is fixed. There is the member called vblk-
> >tag_set.cmd_size and it is set to sizeof(struct virtblk_req) + <sg list size> as
> a part of the device scan routine. Since the Zone Append request has to be larger
> than the regular (non-zoned) one by at least 8 bytes to accommodate the ALBA
> field, we can't mix and match the two for the same device. Here is how this
> device initialization part can be modified to handle the VIRTIO_BLK_F_ZONED
> feature bit if we stick with #2 -
> 
> 
> @@ -810,13 +1287,18 @@ static int virtblk_probe(struct virtio_device *vdev)
>         }
>  
>         memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
> -       vblk->tag_set.ops = &virtio_mq_ops;
>         vblk->tag_set.queue_depth = queue_depth;
>         vblk->tag_set.numa_node = NUMA_NO_NODE;
>         vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
> -       vblk->tag_set.cmd_size =
> -               sizeof(struct virtblk_req) +
> -               sizeof(struct scatterlist) * VIRTIO_BLK_INLINE_SG_CNT;
> +       if (virtio_has_feature(vdev, VIRTIO_BLK_F_ZONED)) {
> +               vblk->tag_set.ops = &virtio_mq_zoned_ops;
> +               vblk->tag_set.cmd_size = sizeof(struct virtblk_zoned_req);
> +       } else {
> +               vblk->tag_set.ops = &virtio_mq_ops;
> +               vblk->tag_set.cmd_size = sizeof(struct virtblk_req);
> +       };
> +       vblk->tag_set.cmd_size +=
> +                       sizeof(struct scatterlist) * VIRTIO_BLK_INLINE_SG_CNT;
>         vblk->tag_set.driver_data = vblk;
>         vblk->tag_set.nr_hw_queues = vblk->num_vqs;

I've included a #1 solution below in case you like it. cmd_size doesn't
need to be changed, sizeof(struct virtblk_req) is fine. Here is struct
virtblk_req extended to handle the ALBA field:

  struct virtblk_req {
      struct virtio_blk_outhdr out_hdr;
+     struct {
+             __virtio64 alba;
+             u8 status;
+     } in;
-     u8 status;
      struct sg_table sg_table;
      struct scatterlist sg[];
  };

The in struct could also be a union of several structs. I kept it simple
assuming just one request (ZONE_APPEND) has additional device-to-driver
(in) fields.

The request layout variation is taken into account when building the
virtqueue scatterlist:

  static int virtblk_add_req(struct virtqueue *vq, struct virtblk_req *vbr,
-         struct scatterlist *data_sg, bool have_data)
+         struct scatterlist *data_sg, bool have_data, bool have_alba)
  {
-     struct scatterlist hdr, status, *sgs[3];
+     struct scatterlist hdr, in, *sgs[3];
      unsigned int num_out = 0, num_in = 0;

      sg_init_one(&hdr, &vbr->out_hdr, sizeof(vbr->out_hdr));
      sgs[num_out++] = &hdr;

      if (have_data) {
          if (vbr->out_hdr.type & cpu_to_virtio32(vq->vdev, VIRTIO_BLK_T_OUT))
              sgs[num_out++] = data_sg;
          else
              sgs[num_out + num_in++] = data_sg;
      }

+     if (have_alba)
+         sg_init_one(&in, &vbr->in, sizeof(vbr->in));
+     else
+         sg_init_one(&in, &vbr->in.status, sizeof(vbr->in.status));
+     sgs[num_out + num_in++] = &in;
-     sg_init_one(&status, &vbr->status, sizeof(vbr->status));
-     sgs[num_out + num_in++] = &status;

      return virtqueue_add_sgs(vq, sgs, num_out, num_in, vbr, GFP_ATOMIC);
  }

Attachment: signature.asc
Description: PGP signature



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]