[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [PATCH 3/3] virtio-blk: add support for zoned block devices
Hi Pankaj, Damien, On Tue, 2022-09-20 at 08:59 +0900, Damien Le Moal wrote: > On 9/19/22 22:41, Pankaj Raghav wrote: > > Hi Dmitry, > > > > On Sun, Sep 18, 2022 at 10:29:21PM -0400, Dmitry Fomichev wrote: > > > The zone-specific code in the patch is heavily influenced by NVMe ZNS > > > code in drivers/nvme/host/zns.c, but it is simpler because the proposed > > > virtio ZBD draft only covers the zoned device features that are > > > relevant to the zoned functionality provided by Linux block layer. > > > > > There is a parallel work going on to support non-po2 zone sizes in Linux > > block layer and drivers[1]. I don't see any reason why we shouldn't make > > the calculations generic here instead of putting the constraint on zone > > sectors to be po2 as the virtio spec also supports it. > > That series is not upstream, so implementing against would not be the > correct approach, especially given that this would also impact qemu code. > I am aware about the effort to support non-^2 zone sizes in the kernel and this activity actually made me drop the ^2 zone size requirement from the virtio-zbd specification. I think that the best way to add non-^2 zone size support to this driver could be a follow up patch to this series. This way, we won't rely on the code that is not yet merged upstream. <there is one more comment about your proposed changes below> > > > > I took a quick look, and changing the calculations from po2 specific to > > generic will not be in the hot path and can be trivially changed. I have > > suggested the changes inline to make the virtio blk driver zone size > > agnostic. I haven't tested the changes but it is very > > similar to the changes I did in the drivers/nvme/host/zns.c in my patch > > series[2]. > > > > [1] > > https://lore.kernel.org/linux-block/20220912082204.51189-1-p.raghav@samsung.com/ > > [2] > > https://lore.kernel.org/linux-block/20220912082204.51189-6-p.raghav@samsung.com/ > > > > > Co-developed-by: Stefan Hajnoczi <stefanha@gmail.com> > > > Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com> > > > Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> > > > --- > > > Âdrivers/block/virtio_blk.cÂÂÂÂÂ | 381 ++++++++++++++++++++++++++++++-- > > > Âinclude/uapi/linux/virtio_blk.h | 106 +++++++++ > > > Â2 files changed, 469 insertions(+), 18 deletions(-) > > > > > <snip> > > > +#ifdef CONFIG_BLK_DEV_ZONED > > > +static void *virtblk_alloc_report_buffer(struct virtio_blk *vblk, > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ unsigned int nr_zones, > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ unsigned int zone_sectors, > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ size_t *buflen) > > > +{ > > > +ÂÂÂÂÂÂÂstruct request_queue *q = vblk->disk->queue; > > > +ÂÂÂÂÂÂÂsize_t bufsize; > > > +ÂÂÂÂÂÂÂvoid *buf; > > > + > > -ÂÂÂÂÂÂÂnr_zones = min_t(unsigned int, nr_zones, > > -ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ get_capacity(vblk->disk) >> ilog2(zone_sectors)); > > > > +ÂÂÂÂÂÂÂnr_zones = min_t(unsigned int, nr_zones, > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ div64_u64(get_capacity(vblk->disk), zone_sectors)); > > > > > + > > > +ÂÂÂÂÂÂÂbufsize = sizeof(struct virtio_blk_zone_report) + > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂnr_zones * sizeof(struct virtio_blk_zone_descriptor); > > > +ÂÂÂÂÂÂÂbufsize = min_t(size_t, bufsize, > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂqueue_max_hw_sectors(q) << SECTOR_SHIFT); > > > +ÂÂÂÂÂÂÂbufsize = min_t(size_t, bufsize, queue_max_segments(q) << > > > PAGE_SHIFT); > > > + > > > +ÂÂÂÂÂÂÂwhile (bufsize >= sizeof(struct virtio_blk_zone_report)) { > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂbuf = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY); > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂif (buf) { > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ*buflen = bufsize; > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂreturn buf; > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ} > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂbufsize >>= 1; > > > +ÂÂÂÂÂÂÂ} > > > + > > > +ÂÂÂÂÂÂÂreturn NULL; > > > +} > > > + > > > +static int virtblk_submit_zone_report(struct virtio_blk *vblk, > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ char *report_buf, size_t report_len, > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ sector_t sector) > > > +{ > > > +ÂÂÂÂÂÂÂstruct request_queue *q = vblk->disk->queue; > > > +ÂÂÂÂÂÂÂstruct request *req; > > > +ÂÂÂÂÂÂÂstruct virtblk_req *vbr; > > > +ÂÂÂÂÂÂÂint err; > > > + > > > +ÂÂÂÂÂÂÂreq = blk_mq_alloc_request(q, REQ_OP_DRV_IN, 0); > > > +ÂÂÂÂÂÂÂif (IS_ERR(req)) > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂreturn PTR_ERR(req); > > > + > > > +ÂÂÂÂÂÂÂvbr = blk_mq_rq_to_pdu(req); > > > +ÂÂÂÂÂÂÂvbr->in_hdr_len = sizeof(vbr->status); > > > +ÂÂÂÂÂÂÂvbr->out_hdr.type = cpu_to_virtio32(vblk->vdev, > > > VIRTIO_BLK_T_ZONE_REPORT); > > > +ÂÂÂÂÂÂÂvbr->out_hdr.sector = cpu_to_virtio64(vblk->vdev, sector); > > > + > > > +ÂÂÂÂÂÂÂerr = blk_rq_map_kern(q, req, report_buf, report_len, GFP_KERNEL); > > > +ÂÂÂÂÂÂÂif (err) > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂgoto out; > > > + > > > +ÂÂÂÂÂÂÂblk_execute_rq(req, false); > > > +ÂÂÂÂÂÂÂerr = blk_status_to_errno(virtblk_result(vbr->status)); > > > +out:return -ENODEV; > > > +ÂÂÂÂÂÂÂblk_mq_free_request(req); > > > +ÂÂÂÂÂÂÂreturn err; > > > +} > > > + > > > +static int virtblk_parse_zone(struct virtio_blk *vblk, > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ struct virtio_blk_zone_descriptor *entry, > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ unsigned int idx, unsigned int zone_sectors, > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ report_zones_cb cb, void *data) > > > +{ > > > +ÂÂÂÂÂÂÂstruct blk_zone zone = { }; > > > + > > > +ÂÂÂÂÂÂÂif (entry->z_type != VIRTIO_BLK_ZT_SWR && > > > +ÂÂÂÂÂÂÂÂÂÂ entry->z_type != VIRTIO_BLK_ZT_SWP && > > > +ÂÂÂÂÂÂÂÂÂÂ entry->z_type != VIRTIO_BLK_ZT_CONV) { > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂdev_err(&vblk->vdev->dev, "invalid zone type %#x\n", > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂentry->z_type); > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂreturn -EINVAL; > > > +ÂÂÂÂÂÂÂ} > > > + > > > +ÂÂÂÂÂÂÂzone.type = entry->z_type; > > > +ÂÂÂÂÂÂÂzone.cond = entry->z_state; > > > +ÂÂÂÂÂÂÂzone.len = zone_sectors; > > > +ÂÂÂÂÂÂÂzone.capacity = le64_to_cpu(entry->z_cap); > > > +ÂÂÂÂÂÂÂzone.start = le64_to_cpu(entry->z_start); > > > +ÂÂÂÂÂÂÂif (zone.cond == BLK_ZONE_COND_FULL) > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂzone.wp = zone.start + zone.len; > > > +ÂÂÂÂÂÂÂelse > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂzone.wp = le64_to_cpu(entry->z_wp); > > > + > > > +ÂÂÂÂÂÂÂreturn cb(&zone, idx, data); > > > +} > > > + > > > +static int virtblk_report_zones(struct gendisk *disk, sector_t sector, > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ unsigned int nr_zones, report_zones_cb cb, > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ void *data) > > > +{ > > > +ÂÂÂÂÂÂÂstruct virtio_blk *vblk = disk->private_data; > > > +ÂÂÂÂÂÂÂstruct virtio_blk_zone_report *report; > > > +ÂÂÂÂÂÂÂunsigned int zone_sectors = vblk->zone_sectors; > > > +ÂÂÂÂÂÂÂunsigned int nz, i; > > > +ÂÂÂÂÂÂÂint ret, zone_idx = 0; > > > +ÂÂÂÂÂÂÂsize_t buflen; > > +ÂÂÂÂÂÂÂu64 remainder; > > > + > > > +ÂÂÂÂÂÂÂif (WARN_ON_ONCE(!vblk->zone_sectors)) > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂreturn -EOPNOTSUPP; > > > + > > > +ÂÂÂÂÂÂÂreport = virtblk_alloc_report_buffer(vblk, nr_zones, > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ zone_sectors, &buflen); > > > +ÂÂÂÂÂÂÂif (!report) > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂreturn -ENOMEM; > > > + > > -ÂÂÂÂÂÂÂsector &= ~(zone_sectors - 1); > > > > +ÂÂÂÂÂÂÂdiv64_u64_rem(sector, zone_sectors, &remainder); > > +ÂÂÂÂÂÂÂsector -= remainder; > > > +ÂÂÂÂÂÂÂwhile (zone_idx < nr_zones && sector < get_capacity(vblk->disk)) { > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂmemset(report, 0, buflen); > > > + > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂret = virtblk_submit_zone_report(vblk, (char *)report, > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ buflen, sector); > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂif (ret) { > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂif (ret > 0) > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂret = -EIO; > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂgoto out_free; > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ} > > > + > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂnz = min((unsigned int)le64_to_cpu(report->nr_zones), > > > nr_zones); > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂif (!nz) > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂbreak; > > > + > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂfor (i = 0; i < nz && zone_idx < nr_zones; i++) { > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂret = virtblk_parse_zone(vblk, &report->zones[i], > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ zone_idx, zone_sectors, > > > cb, data); > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂif (ret) > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂgoto out_free; > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂzone_idx++; > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ} > > > + > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂsector += zone_sectors * nz; > > > +ÂÂÂÂÂÂÂ} > > > + > > > +ÂÂÂÂÂÂÂif (zone_idx > 0) > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂret = zone_idx; > > > +ÂÂÂÂÂÂÂelse > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂret = -EINVAL; > > > +out_free: > > > +ÂÂÂÂÂÂÂkvfree(report); > > > +ÂÂÂÂÂÂÂreturn ret; > > > +} > > > + > > <snip> > > > +static int virtblk_probe_zoned_device(struct virtio_device *vdev, > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ struct virtio_blk *vblk, > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ struct request_queue *q) > > > +{ > > <snip> > > > +ÂÂÂÂÂÂÂblk_queue_physical_block_size(q, le32_to_cpu(v)); > > > +ÂÂÂÂÂÂÂblk_queue_io_min(q, le32_to_cpu(v)); > > > + > > > +ÂÂÂÂÂÂÂdev_dbg(&vdev->dev, "write granularity = %u\n", le32_to_cpu(v)); > > > + > > -ÂÂÂÂÂÂÂ/* > > -ÂÂÂÂÂÂÂ * virtio ZBD specification doesn't require zones to be a power of > > -ÂÂÂÂÂÂÂ * two sectors in size, but the code in this driver expects that. > > -ÂÂÂÂÂÂÂ */ > > -ÂÂÂÂÂÂÂvirtio_cread(vdev, struct virtio_blk_config, zoned.zone_sectors, &v); > > -ÂÂÂÂÂÂÂif (v == 0 || !is_power_of_2(v)) { > > -ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂdev_err(&vdev->dev, > > -ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ"zoned device with non power of two zone size %u\n", > > v); > > -ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂreturn -ENODEV; > > -ÂÂÂÂÂÂÂ} This part can't be omitted entirely because it contains the call to read zone_sectors value from virtio-blk configuration space. This code instead could be changed to something like virtio_cread(vdev, struct virtio_blk_config, zoned.zone_sectors, &v); if (v == 0) { dev_err(&vdev->dev, "zoned device with zero zone size %u\n", v); return -ENODEV; } DF > > > + > > > +ÂÂÂÂÂÂÂdev_dbg(&vdev->dev, "zone sectors = %u\n", le32_to_cpu(v)); > > > +ÂÂÂÂÂÂÂvblk->zone_sectors = le32_to_cpu(v); > > > + > > > +ÂÂÂÂÂÂÂif (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) { > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂdev_warn(&vblk->vdev->dev, > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ "ignoring negotiated F_DISCARD for zoned > > > device\n"); > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂblk_queue_max_discard_sectors(q, 0); > > > +ÂÂÂÂÂÂÂ} > > > + > > > +ÂÂÂÂÂÂÂret = blk_revalidate_disk_zones(vblk->disk, NULL); > > > +ÂÂÂÂÂÂÂif (!ret) { > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂvirtio_cread(vdev, struct virtio_blk_config, > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ zoned.max_append_sectors, &v); > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂif (!v) { > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂdev_warn(&vdev->dev, "zero max_append_sectors > > > reported\n"); > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂreturn -ENODEV; > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ} > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂblk_queue_max_zone_append_sectors(q, le32_to_cpu(v)); > > > +ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂdev_dbg(&vdev->dev, "max append sectors = %u\n", > > > le32_to_cpu(v)); > > > + > > > +ÂÂÂÂÂÂÂ} > > > + > > > +ÂÂÂÂÂÂÂreturn ret; > > > +} > > > + > > >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]