Subject: Re: [virtio-comment] Google Comments on Virtio Draft Spec
Il 05/06/2014 09:45, Andrew Thornton ha scritto:First of all, I think it's not necessary to select an error for these cases. These issues are not specific to virtio-scsi and the command will succeed at the virtio-scsi level; for cmd_per_lun the the status could be BUSY (not VIRTIO_SCSI_S_BUSY!) or TASK SET FULL, for transfer length the SCSI standard says you get INVALID FIELD IN CDB. These status or sense codes are defined in the appropriate SCSI standards.
Device configuration layout
1. max_sectors and cmd_per_lun are described as 'hints'
1.1. Can these become hard limits rather than 'hints'? (IE
devices can reject commands above the cmd_per_lun limit
or the max_sectors limit). If so, can we select a specific
error to return in that case?
The configuration limits are imposed by the hypervisor, so transfer lengths or queue depths higher than the values in the configuration should cause an error. The reason these are hints is because the issue is quite complex if you do SCSI passthrough, and in that case a transfer length or queue depth lower than the limit could trigger an error.
For example, each target or LUN could actually have its own transfer limit, that is lower than max_sectors. In this case the initiator should look for the block limits VPD page anyway.
As to cmd_per_lun, you could obey cmd_per_lun and still get TASK SET FULL responses from the target if the host or other guests are using it at the same time. Perhaps the hypervisor could change that to BUSY (again, not VIRTIO_SCSI_S_BUSY), but this is again a generic SCSI target implementation issue, not specific to virtio-scsi.Yes (with TASK SET FULL status). Though if virtqueue_size < cmd_per_lun, the driver actually won't have room to queue more than virtqueue_size items.
2. cmd_per_lun describes 'the actual value to be used is the
minimum of cmd_per_lun and the virtqueue size'.
2.1. Does this mean that devices can reject concurrent commands
above min(cmd_per_lun, virtqueue_size)?
Not if you use indirect descriptors.
2.2. Do you really mean 'virtqueue_size'? At minimum a command
requires at least 2 entries in the virtqueue. Should this
minimum be virtqueue_size / 2?
That usually makes sense, but it does not have to be that way. For example, under Linux you can mark a request as "failfast" and avoid the retry.
Device operation: requestq
1. When a transport returns VIRTIO_SCSI_S_BUSY, can we specify that a
guest should retry the request? This would simplify device
implementations in the face of resource limitations and would
allow guests to control I/O queueing.
Either I/O is completed, or it is already documented to be ILLEGAL REQUEST/LOGICAL UNIT NOT SUPPORTED.
2. When a target is hotunplugged with I/O inflight, can we specify which
error response will be returned for the now-terminated I/Os?
I and Venkatesh sorted this out on the upstream linux-scsi mailing list for the abort case.
Device operation: controlq
The ordering of Task Management Function completion with
respect to requests they are acting on is unspecified. However
SCSI midlayers require TMF commands complete _after_ the command(s)
they are aborting/reseting.
The ordering of completing TMFs vs. requests are now documented, but the Linux driver messed up this case.I think this cannot happen on QEMU if the commands are undispatched _and_ the doorbell register has been written, since QEMU is basically single-threaded. If the doorbell register has not been written to, the driver is probably buggy (sending a reset and a command at the same time is probably not a good idea).
This requires a device ensure ordering between the controlq and
requestq processing; for TMF RESET, this means a reset must
drain all the request queues (searching for undispatched
commands; QEMU does not do this currently and can corrupt guest
memory in the worst case).
This would be too late for 1.0. I'm also not convinced it is a good idea, for if the request queue is full you cannot send TMFs to abort commands. Also, the virtio-scsi standard does not document how you use multiple request queues, and multiple request queues would have the same ordering problems as the separate control queue.
1. If we could have a feature flag (VIRTIO_SCSI_F_TMF_ON_REQUESTQ)
that allowed TMF commands to be sent down the requestqueue,ordering
would be naturally enforced and devices would save a lot of complexity.
I think it is also too late for this. It is a safer and smaller change, but I'm not sure what the properties of the no-op command would be (e.g. with respect to ordering) so I'm afraid of missing some important detail.
2. If that is not possible, a guest driver can cycle a
no-op command through request queue(s) before aborting/resetting
a command. To do this, we need to codify a safe no-op command.
We could use a command w/ lun = 0x0 as a safe no-op command.
This is currently the case for QEMU, vhost-scsi, and GCE. We would
like to have this formalized.
A REPORT LUNS command should work well as a no-op. Or we could document that the target SHOULD implement the REPORT LUNS well-known LUN (C1/01), and then use a TEST UNIT READY command to that LUN.
The C1/01 well-known LUN would be allowed in addition to 01/tgt/xx/yy format. Since it's a SHOULD, it doesn't even require a feature bit. I sent a patch for that.