Re: [virtio-comment] Google Comments on Virtio Draft Spec

Comments below;

'Configuration'
Device configuration layout

>> 1. max_sectors and cmd_per_lun are described as 'hints'
>> 1.1. Can these become hard limits rather than 'hints'? (IE
>> devices can reject commands above the cmd_per_lun limit
>> or the max_sectors limit). If so, can we select a specific
>> error to return in that case?
>>
> First of all, I think it's not necessary to select an error for these
> cases. These issues are not specific to virtio-scsi and the command
> will succeed at the virtio-scsi level; for cmd_per_lun the the status
> could be BUSY (not VIRTIO_SCSI_S_BUSY!) or TASK SET FULL, for transfer
> length the SCSI standard says you get INVALID FIELD IN CDB. These
> status or sense codes are defined in the appropriate SCSI standards.
>
> The configuration limits are imposed by the hypervisor, so transfer
> lengths or queue depths higher than the values in the configuration
> should cause an error. The reason these are hints is because the issue
> is quite complex if you do SCSI passthrough, and in that case a transfer
> length or queue depth lower than the limit could trigger an error.
>
> For example, each target or LUN could actually have its own transfer
> limit, that is lower than max_sectors. In this case the initiator
> should look for the block limits VPD page anyway.
>
> As to cmd_per_lun, you could obey cmd_per_lun and still get TASK SET
> FULL responses from the target if the host or other guests are using it
> at the same time. Perhaps the hypervisor could change that to BUSY
> (again, not VIRTIO_SCSI_S_BUSY), but this is again a generic SCSI target
> implementation issue, not specific to virtio-scsi.

If you treat max_sectors and cmd_per_lun as transport-specific limits and
the INQUIRY data/Block Limits VPDs as SCSI-layer limits, the differences
become clearer -- for transfer that fail because of limitations from the
target (INQUIRY/BLOCK LIMITS VPD levels), the target's SCSI-layer response
is the correct one. For transfers that fail because they hit the transport
limit (max_sectors/cmd_per_lun), a transport-level response would make more
sense.

>> 2. When a target is hotunplugged with I/O inflight, can we specify which
>> error response will be returned for the now-terminated I/Os?

> Either I/O is completed, or it is already documented to be ILLEGAL
> REQUEST/LOGICAL UNIT NOT SUPPORTED.

When an I/O is already running against a target and it is hotremoved,
ILLEGAL REQUEST/LOGICAL UNIT NOT SUPPORTED is not a correct SCSI response.
That response would be appropriate if the I/O reached the target after it
was removed, but not for the case where the command was already running.

Also, consider long-running operations -- once a guest has already seen
them running on a target (ex: long running operation in progress), it would
not be correct to return ILLEGAL REQUEST/LOGICAL UNIT NOT SUPPORTED.

We would like to see a transport-layer response here, for example
VIRTIO_SCSI_S_TRANSPORT_FAILURE. Hot-unplugging a disk is analogous to
breaking the link to the target and Linux will reflect this as a
DID_TRANSPORT_DISRUPTED host byte.

>> 2. If that is not possible, a guest driver can cycle a
>> no-op command through request queue(s) before aborting/resetting
>> a command. To do this, we need to codify a safe no-op command.
>>
>> We could use a command w/ lun[0] = 0x0 as a safe no-op command.
>> This is currently the case for QEMU, vhost-scsi, and GCE. We would
>> like to have this formalized.

> I think it is also too late for this. It is a safer and smaller change,
> but I'm not sure what the properties of the no-op command would be (e.g.
> with respect to ordering) so I'm afraid of missing some important
> detail.

1. REPORT LUNS to the well-known address would work. No device emulation
currently supports it to the well-known address, however.

2. Can we specify that it is not an error (the device will not enter a
broken state) to send a command with either an invalid LUN or a command
that is too short? (ex: a 1 byte command?). Even an invalid command
would serve as a fine no-op, if it doesn't wedge the device.

On Thu, Jun 5, 2014 at 2:38 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:

Il 05/06/2014 09:45, Andrew Thornton ha scritto:

'Configuration'

Device configuration layout

1. max_sectors and cmd_per_lun are described as 'hints'

1.1. Can these become hard limits rather than 'hints'? (IE
devices can reject commands above the cmd_per_lun limit
or the max_sectors limit). If so, can we select a specific
error to return in that case?

First of all, I think it's not necessary to select an error for these cases. These issues are not specific to virtio-scsi and the command will succeed at the virtio-scsi level; for cmd_per_lun the the status could be BUSY (not VIRTIO_SCSI_S_BUSY!) or TASK SET FULL, for transfer length the SCSI standard says you get INVALID FIELD IN CDB. These status or sense codes are defined in the appropriate SCSI standards.

The configuration limits are imposed by the hypervisor, so transfer lengths or queue depths higher than the values in the configuration should cause an error. The reason these are hints is because the issue is quite complex if you do SCSI passthrough, and in that case a transfer length or queue depth lower than the limit could trigger an error.

For example, each target or LUN could actually have its own transfer limit, that is lower than max_sectors. In this case the initiator should look for the block limits VPD page anyway.

As to cmd_per_lun, you could obey cmd_per_lun and still get TASK SET FULL responses from the target if the host or other guests are using it at the same time. Perhaps the hypervisor could change that to BUSY (again, not VIRTIO_SCSI_S_BUSY), but this is again a generic SCSI target implementation issue, not specific to virtio-scsi.

2. cmd_per_lun describes 'the actual value to be used is the
minimum of cmd_per_lun and the virtqueue size'.

2.1. Does this mean that devices can reject concurrent commands
above min(cmd_per_lun, virtqueue_size)?

Yes (with TASK SET FULL status). Though if virtqueue_size < cmd_per_lun, the driver actually won't have room to queue more than virtqueue_size items.

2.2. Do you really mean 'virtqueue_size'? At minimum a command
requires at least 2 entries in the virtqueue. Should this
minimum be virtqueue_size / 2?

Not if you use indirect descriptors.

Device operation: requestq

1. When a transport returns VIRTIO_SCSI_S_BUSY, can we specify that a
guest should retry the request? This would simplify device
implementations in the face of resource limitations and would
allow guests to control I/O queueing.

That usually makes sense, but it does not have to be that way. For example, under Linux you can mark a request as "failfast" and avoid the retry.

2. When a target is hotunplugged with I/O inflight, can we specify which
error response will be returned for the now-terminated I/Os?

Either I/O is completed, or it is already documented to be ILLEGAL REQUEST/LOGICAL UNIT NOT SUPPORTED.

Device operation: controlq

The ordering of Task Management Function completion with
respect to requests they are acting on is unspecified. However
SCSI midlayers require TMF commands complete _after_ the command(s)
they are aborting/reseting.

I and Venkatesh sorted this out on the upstream linux-scsi mailing list for the abort case.

The ordering of completing TMFs vs. requests are now documented, but the Linux driver messed up this case.

This requires a device ensure ordering between the controlq and
requestq processing; for TMF RESET, this means a reset must
drain all the request queues (searching for undispatched
commands; QEMU does not do this currently and can corrupt guest
memory in the worst case).

I think this cannot happen on QEMU if the commands are undispatched _and_ the doorbell register has been written, since QEMU is basically single-threaded. If the doorbell register has not been written to, the driver is probably buggy (sending a reset and a command at the same time is probably not a good idea).

1. If we could have a feature flag (VIRTIO_SCSI_F_TMF_ON_REQUESTQ)
that allowed TMF commands to be sent down the requestqueue,ordering
would be naturally enforced and devices would save a lot of complexity.

This would be too late for 1.0. I'm also not convinced it is a good idea, for if the request queue is full you cannot send TMFs to abort commands. Also, the virtio-scsi standard does not document how you use multiple request queues, and multiple request queues would have the same ordering problems as the separate control queue.

2. If that is not possible, a guest driver can cycle a
no-op command through request queue(s) before aborting/resetting
a command. To do this, we need to codify a safe no-op command.

We could use a command w/ lun[0] = 0x0 as a safe no-op command.
This is currently the case for QEMU, vhost-scsi, and GCE. We would
like to have this formalized.

I think it is also too late for this. It is a safer and smaller change, but I'm not sure what the properties of the no-op command would be (e.g. with respect to ordering) so I'm afraid of missing some important detail.

A REPORT LUNS command should work well as a no-op. Or we could document that the target SHOULD implement the REPORT LUNS well-known LUN (C1/01), and then use a TEST UNIT READY command to that LUN.

The C1/01 well-known LUN would be allowed in addition to 01/tgt/xx/yy format. Since it's a SHOULD, it doesn't even require a feature bit. I sent a patch for that.

Paolo

--
"Debugging is twice as hard as writing the code in the first place.

Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."

(Brian W. Kernighan)

virtio-comment message