OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-dev message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state




On 9/18/2023 2:37 PM, Parav Pandit wrote:
From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Monday, September 18, 2023 10:55 AM
To: Parav Pandit <parav@nvidia.com>; virtio-dev@lists.oasis-open.org; Michael
S. Tsirkin <mst@redhat.com>; Jason Wang <jasowang@redhat.com>
Subject: Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq
state

CC MST and Jason

On 9/18/2023 1:21 PM, Zhu, Lingshan wrote:

On 9/18/2023 12:32 PM, Parav Pandit wrote:
From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Monday, September 18, 2023 8:40 AM

On 9/17/2023 1:32 PM, Parav Pandit wrote:
From: virtio-dev@lists.oasis-open.org
<virtio-dev@lists.oasis-open.org> On Behalf Of Zhu, Lingshan
Sent: Friday, September 15, 2023 9:59 AM

On 9/14/2023 7:14 PM, Michael S. Tsirkin wrote:
On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
This series introduces
1)a new SUSPEND bit in the device status Which is used to
suspend the device, so that the device states and virtqueue
states are stabilized.

2)virtqueue state and its accessor, to get and set
last_avail_idx and last_used_idx of virtqueues.

The main usecase of these new facilities is Live Migration.

Future work: dirty page tracking and in-flight descriptors.
This series addresses many comments from Jason, Stefan and
Eugenio from RFC series.
Compared to Parav's patchset this is much less functional.
we will add dirty page tracking and in-flight IO tracker in V2,
then it will be a full featured LM solution.

They are not in this series because we want this series to be
small and focus.
Assuming that one goes in, can't we add ability to submit admin
commands through MMIO on the device itself and be done with it?
I am not sure, IMHO, if we use admin vq as back-ends for MMIO
based live migration, then the issues in admin vq still exist, for example:
1)nested virtualization
2)bare-metal live migration
3)QOS
4)introduce more attacking surfaces.

#4 is just random without.
I failed to process "random without".

If you expect admin vq to perform live migration, it can certainly
be a side channel attacking surface, for example:
a) a malicious SW can stop the device running
b) a malicious SW can sniff guest memory by tracking guest dirty
pages, then speculate guest operations and stole secrets.
This is the mode when hypervisor is trusted.
PF is not always owned by the hypervisor, right?
And you don't pass-through the PF to any guests, right?
When hypervisor is untrusted, the CC model TDISP enabled device, TSM
will delegate the tasks to the DSM.
TDISP devices can not be migrated for now.
That is fine, the infra is build so that it can be migrated one day.
And at that point the proposed admin command-based model also fits fine.
since you are talking about TDISP, I suggest to read TDISP spec,
it says:
Device Security Architecture - Administrative interfaces (e.g., a PF) may be used to influence the security properties of the TDI used by the TVM. The deviceâs security architecture must provide isolation and access control for TVM data in the device for protection against entities that are not in
the trust boundary of the TVM

so admin vq based LM solution can be a side channel attacking surface

For untrusted hypervisor, same set of attack surface is present with
trap+emulation.
So both method score same. Hence its not relevant point for discussion.
this is not hypervisor, Do you see any modern hypervisor have these
issues?

This is admin vq for LM can be a side channel attacking surface.
It is not.
Hypervisor is trusted entity.
For untrusted hypervisor the TDISP is unified solution build by the various industry bodies including DMTF, PCI for last few years.
We want to utilize that.
first, TDISP is out of virtio spec.
second, TDISP devices can not be migrated for now
third, admin vq can be an side channel attacking surface as explained above.

#3 There is no QoS issue with admin commands and queues. If you
claim that
then whole virtio spec based on the virtqueues is broken.
And it is certainly not the case.
Please do not confuse the concepts and purposes of the data queues
and admin vq.

I am not confused.
There is no guarantee that a register placed on the VF will be
serviced by the device in exact same time regardless of VF count = 1
or 4000.
Yet again not relevant comparison.
please read my previous replies in other threads.
It does not answer.
The claim that somehow a polling register ensures downtime guarantee for scale of thousands of member devices is some specific device implementation without explanation.
the registers and the LM facilities are per-device.

For data-queues, it can be slow without mq or rss, that means
performance overhead, but can work.
No, it does not work. The application failed because of jitter in the
video and audio due to missing the latency budget.
A financial application is terminated due to timeouts and packet loss.

Device migration is just another 3rd such applications.

Its also same.
My last reply on this vague argument.
I think the points are clear, and you already understand the points,
so no need to argue anymore
Yes, I am clear from long time, nor AQ nor no register, RSS queues, none cannot guarantee any performance characteristics.
It is pretty clear to me.
Any performance guarantees are explicitly requested when desired.

For admin vq, if it don't meet QOS requirements, it fails to migrate
guests.

I have replied to the same question so many times, and this is the
last time.
I also replied many times that QoS argument is not valid anymore.
Same can happen with registers writes.
Perf characteristics for 30+ devices is not in the virtio spec. It is
implementation details.
as replied many times, registers only serve the device itself and
registers are not DATA PATH, means the device don't transfer data
through registers.
It does not matter data path or control path, the fact is it downtime assurance cannot be guaranteed by register interface design, it is the implementation details.
And so does for admin commands and/or AQ.
the registers do not perform any data transitions, e.g., we don't migrate dirty pages through registers.
But you do these by admin vq



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]