OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [virtio-comment] [PATCH 1/2] virtio-balloon: add an event queue


On 31.01.22 08:10, David Stevens wrote:
> On Sat, Jan 29, 2022 at 12:46 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 24.01.22 13:56, David Stevens wrote:
>>> Add an event queue to the balloon to allow the driver to send events to
>>> the device, which allows the device to be more responsive to the memory
>>> needs of the guest.
>>>
>>> There are two defined events. The first event is an out of memory event.
>>> This event provides a way for the guest to handle out of memory events
>>> on a system where the host does not support deflate-on-oom. The second
>>> event is an out of puff event, which the guest can send when it fails to
>>> allocate memory while trying to inflate the balloon. This serves as an
>>> indication to the device that the driver may not be able to inflate the
>>> balloon in a timely manner.
>>
>> Do we actually need a queue for that or could a simple member in the
>> config space be sufficient?
>>
>> Something like a "logical driver state". By storing specific value, the
>> the driver can notify the hypervisor about such events.
>>
>> Just an idea.
> 
> Having an event field in the config space should work. I do think
> having an ack is important, but that could be done via the device
> clearing the event field and triggering a config change callback.
> 

Right, but for an ACK a queue might actually be better.

> The one drawback of this approach I can think of is that it requires
> that the driver writing to the device config space generates events
> for the device. At least based on my understanding of the balloon
> spec, the device can completely ignore any writes to the device config
> space after the DRIVER_OK bit is set. That allows the device to put
> the device config on its own page and use KVM to map that page into
> the guest, which then skips the need for any vmexits when
> reading/writing num_pages or actual in the guest. I guess this would
> still technically be achievable even with the event field by making
> the offset of the device config non-page-aligned, so that the event
> field lies on a new page. However, if any new fields get added after
> the event field, they would generate vmexits as well. I don't know if
> that's a concern to worry about, though.
> 
>> Of course, you cannot really come up with an additional payload, but my
>> guess is that "The \field{data} value indicates how many pages the
>> driver requires to maintain system stability." will be wrong in 99.999%
>> of all cases. Relying on anything provided by the shrinker or the OOM
>> handler does not give you an idea what the system needs overall to "
>> maintain system stability".
> 
> One of the device or the driver needs to determine how much to deflate
> the balloon in response to the OOM. It's true that the driver probably
> can't guess exactly how much memory is needed, but it's definitely
> going to be able to make a better guess than the device. In the end,
> the device can do whatever it wants with the request, but I think it's
> still useful for the driver to be able to provide a hint.

The device is already making tons of guesses by staring at the memory
statistics, so I think we should leave that guessing also to the device.
At least I don't see how the values reported by the driver could be any
better (they will be wrong most of the time either way).

>>> +
>>> +\drivernormative{\paragraph}{Events}{Device Types / Memory Balloon Device / Device Operation / Events}
>>> +
>>> +The driver MUST update \field{actual} with any allocated pages before
>>> +sending a VIRTIO_BALLOON_EVENT_OOPUFF event.
>>> +
>>> +The driver SHOULD wait for the device to acknowledge the event
>>> +before trying to further inflate or deflate the balloon.
>>> +
>>> +If VIRTIO_BALLOON_F_DEFLATE_ON_OOM has been negotiated, the driver
>>> +SHOULD send an OOM event before using pages from the balloon.
>>> +
>>> +\devicenormative{\paragraph}{Events}{Device Types / Memory Balloon Device / Device Operation / Events}
>>> +
>>> +When the device receives a VIRTIO_BALLOON_EVENT_OOM event, it SHOULD deflate
>>> +the balloon by \field{data} pages before acknowledging the event.
>>
>> The issue is that this is asynchronous. You won't really be able to stop
>> OOM from killing processes as you usually won't be able to get back
>> pages fast enough.
> 
> If the device reduces num_pages before acking the message, then the
> driver can wait for the ack and deflate the balloon synchronously. For
> Linux specifically, blocking in the OOM notifier is fine (at least the
> balloon driver already acquires a mutex here). And while it's true
> that reclaiming memory might not be fast, my understanding is that
> anywhere that could invoke the OOM killer can also invoke swap to
> disk, which is also not fast.

And that's the main issue IIRC. Allocation paths that *cannot* do that
(sleep, trigger the OOM killer) will fail the allocation instead,
essentially destabilizing your system or just crashing with unexpected
behavior. Reclaim can be done mostly synchronous if need be IIRC.

So once *some* path triggers the OOM killer and you try to keep up,
other parts of the system can already start falling apart.

Hooking into the shrinker interface is better, however, has some ugly
side-effects that random memory pressure will completely deflate the
balloon.

See 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
followed by da10329cb057 ("virtio-balloon: switch back to OOM handler
for VIRTIO_BALLOON_F_DEFLATE_ON_OOM")

Especially my note about "The shrinker does not have a concept of
priorities yet, so this behavior cannot be configured."


Long story short: we should avoid hooking into the OOM killer for all
new features.

-- 
Thanks,

David / dhildenb



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]