[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [virtio-dev] [PATCH RFC 3/3] rng: leak detection support
On 28/9/23 20:16, Babis Chalios wrote:
On 27/9/23 23:47, Michael S. Tsirkin wrote:On Wed, Sep 27, 2023 at 12:43:20PM +0200, Babis Chalios wrote:On 22/9/23 18:01, Michael S. Tsirkin wrote:On Fri, Sep 22, 2023 at 05:40:50PM +0200, Babis Chalios wrote:On 22/9/23 17:06, Michael S. Tsirkin wrote:On Tue, Sep 19, 2023 at 12:11:37PM +0200, Babis Chalios wrote:On 19/9/23 12:01, Michael S. Tsirkin wrote:It could, but then we have the exact race condition that VMGENID had,On Tue, Sep 19, 2023 at 09:32:08AM +0200, Babis Chalios wrote:Resending to fix e-mail formatting issues (sorry for the spam) On 18/9/23 18:30, Babis Chalios wrote:Then it will see a single event in 1-X instead of two events. A leak isYes, that's what the driver does now in the RFC patch. However, this just decreases the race window, it doesn't eliminate it. If a third leak event happens it might not find any buffers to use: 1. available buffers to queue 1-X 2. available buffers to queue X 3. poll queue X 4. used buffers in queue X <- leak event 1 will use buffers in X 5. avail buffers in queue X 6. poll queue 1-X <- leak event 2 will use buffers in 1-X 7. used buffers in queue 1-X 8. avail buffers in queue 1-X <- leak event 3 (it needs buffers in X, race with step 5) 9. goto 3I don't get it. we added buffers in step 5.What if the leak event 3 arrives before step 5 had time to actually add the buffers in X and make them visible to the device?a leak though, I don't see does it matter how many triggered.So the scenario I have in mind is the following:(Epoch here is terminology that I used in the Linux RFC. It is a valuemaintained by random.c that changes every time a leak event happens). 1. add buffers to 1-X 2. add buffers to X 3. poll queue X 4. vcpu 0: get getrandom() entropy and cache epoch value 5. Device: First snapshot, uses buffers in X 6. vcpu 1: sees used buffers 7. Device: Second snapshot, uses buffers in 1-X 8. vcpu 0: getrandom() observes new epoch value & caches it9. Device: Third snapshot, no buffers in either queue, (vcpu 1 from step 6has not yet finished adding new buffers). 10. vcpu 1 adds new buffer in X11. vcpu 0: getrandom() will not see new epoch and gets stale entropy.In this succession of events, when the third snapshot will happen, thedevice won't findany buffers in either queue, so it won't increase the RNG epoch value. So,any entropygathered after step 8 will be the same across all snapshots. Am I missingsomething? Cheers, BabisYes but notice how this is followed by: 12. vcpu 1: sees used buffers in 1-X Driver can notify getrandom I guess?userspace has already consumed stale entropy and there's nothing we can do about that.Although this is indeed a corner case, it feels like it beats the purpose of having the hardware update directly userspace (via copy on leak).How do you feel about the proposal a couple of emails back? It looks tome that it avoids completely the race condition. Cheers, BabisIt does. The problem of course is that this means that e.g. taking a snapshot of a guest that is stuck won't work well.That is true, but does it matter? The intention of the proposal is that if it is not safe to take snapshots (i.e. no buffers in the queue) don't take snapshots.I have been thinking of adding MAP/UNMAP descriptors for a while now. Thus it will be possible to modify userspace memory without consuming buffers. Would something like this solve the problem?I am not familiar with MAP/UNMAP descriptors. Is there a link where I can read about them? Cheers, BabisHeh no I just came up with the name. Will write up in a couple of days, but the idea is that driver does get_user_pages, adds buffer to queue, and device will remember the address and change that memory on a snapshot. If there are buffers in the queue it will also use these to tell driver, but if there are no buffers then it won't.That sounds like a nice mechanism. However in our case the page holding the counter that gets increased by the hardware is a kernel page. The reason for that is that things other than us (virtio-rng) might want to notify for leak events. For example, I think that Jason intended to use this mechanism to periodically notify user-space PRNGs that they need to reseed. Cheers, BabisNow I'm lost. when you write, e.g.: 4. vcpu 0: get getrandom() entropy and cache epoch value how does vcpu access the epoch?The kernel provides a user space API to map a pointer to the epoch value. User space then caches its value and checks it every time it needs to make sure that no entropy leak has happened before using cached kernel entropy. virtio-rng driver adds a copy on leak command to the queue for increasing this value (that's what we are speaking about in this thread). But other systems might want to report "leaks", such as random.c itself. Cheers, Babis
Hey Michael, does this explain the flow of things? To summarize, end-to-end things work like this: 1. kernel allocates a page where an epoch value is stored. 2. User-space can mmap this page and look for changes in its value to ÂÂÂ know when it needs to re-seed its PRNGs. 3. virtio-rng driver gets a hold of the address of the page and programs ÂÂÂ device to update it when entropy is leaked 4. Other kernel sub-systems can do the same if they have a concept ofÂÂÂ entropy leaking. For example, random.c might want to do this periodically
Regarding your suggestion for MAP/UNMAP buffers, I think it could work if it was working with kernel addresses as well. The device doesn't care about user vs kernel space guest addresses after all. However, if we were to use such a mechanism, it seems that we don't need the two leak queues anyway. So, it sounds like we can have a setup with one leak queue only. If we use a MAP/UNMAP type of buffer then all is fine. If we use normal buffers, we can add the requirement that the VMM needs to refuse to take a snapshot if there is not a copy on leak buffer in the queue. Thoughts? Cheers, Babis
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]