OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

virtio message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: [PATCH] 2.5 Reserve device ID 0 (zero) as invalid


Signed-off-by: Rusty Russell <rusty@au1.ibm.com>
Acked-by: Pawel Moll <pawel.moll@arm.com>
---
 virtio-v1.0-wd01-part1-specification.txt | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-pa=
rt1-specification.txt
index a3ee054..5d7280d 100644
--- a/virtio-v1.0-wd01-part1-specification.txt
+++ b/virtio-v1.0-wd01-part1-specification.txt
@@ -1076,6 +1076,8 @@ Discovering what devices are available and their type=
 is bus-dependent.
 | Device ID  |   Virtio Device    |
 +------------+--------------------+
 +------------+--------------------+
+| 0          |   none (ignore)    |
++------------+--------------------+
 | 1          |   network card     |
 +------------+--------------------+
 | 2          |   block device     |
@@ -1101,6 +1103,8 @@ Discovering what devices are available and their type=
 is bus-dependent.
 | 12         |   virtio CAIF      |
 +------------+--------------------+
=20
+When a device is discovered with a device ID of 0, it should be ignored.
+
 2.5.1 Network Device
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
=20
--=20
1.8.1.2



      was (Author: hornet):
    From dbe71974fcd8dbf3ed0f032539d340604fc0c264 Mon Sep 17 00:00:00 2001
From: Rusty Russell <rusty@au1.ibm.com>
Date: Tue, 27 Aug 2013 12:12:06 +0100
Subject: [PATCH] 2.5 Reserve device ID 0 (zero) as invalid

Signed-off-by: Rusty Russell <rusty@au1.ibm.com>
Acked-by: Pawel Moll <pawel.moll@arm.com>
---
 .virtio-v1.0-wd01-part1-specification.txt.swp      |  Bin 0 -> 16384 bytes
 ...efine-all-MMIO-registers-as-little-endian.patch |   31 +
 id0.mbox                                           |   41 +
 virtio-v1.0-wd01-part1-specification.txt           |    4 +
 virtio-v1.0-wd01-part1-specification.txt.orig      | 2890 ++++++++++++++++=
++++
 5 files changed, 2966 insertions(+)
 create mode 100644 .virtio-v1.0-wd01-part1-specification.txt.swp
 create mode 100644 0001-2.4.2.2-Define-all-MMIO-registers-as-little-endian=
.patch
 create mode 100644 id0.mbox
 create mode 100644 virtio-v1.0-wd01-part1-specification.txt.orig

diff --git a/.virtio-v1.0-wd01-part1-specification.txt.swp b/.virtio-v1.0-w=
d01-part1-specification.txt.swp
new file mode 100644
index 0000000000000000000000000000000000000000..24526a2cad4e5633eeccc6071ea=
d6abe4fd4bd3b
GIT binary patch
literal 16384
zcmeI3U5q4E6~_x)6i_~*4~B%`b&;@adV018<H9CJm-%2PyE_B3%R*dsH#JqayNj)^
zThzz&G!IH7A`c|SXiP-qLk&Km559ng2Okul;K?UQGzf?yu(GK5?eE;W)icv9D?U@5
z{ZG%-se9_4|2^m2d(W+1-F0N~pxWKO*5h-j=3DY8dim#>_2-^w|sFZaCE9}Y~st0SVb
zeflO#baA#jD%t02h0}Q!OdCIs@+p(*q;0a^^jee^k#S#BYn}G4sbRRQGnM*T(V5Cq
z9YozI@VO^x7whlhY5%)F%^eK`?=3D1!{@~)lf>=3DK)=3D-2QR(kq_VU-ZH<jwqc-Qpkbh4
zpkbh4pkbh4pkd%W&Onh~;e7!P&UHK(_kGU9_kY~ygu9+j$p7u+N8I&iCglHe@<Z<W
zRqg@xbN=3Db%p_8vq$p0}Re{MqlcPBsY+JDv=3DRO|nnlYh?1ADocC<>Xx_e``YiS0}G`
zOvm3>-B|1Y%S8JhPsso5<O8SwvTa-2zv<*XC%<<>e%i@zbMjXw<Zn26-^u6C+p7Oh
zPF_0sV-xaIP98Y<_V;gX|N4adOB3?foV+f~ygVU))yX5L|Mm}T)&EB)ulf1;g!~mJ
zukD*Te{1_cIQbn;{{s{9-#dBjU)wI&+WuuHul>6}A^)9|-{kZ^H6ee=3D$?N#}=3D!IML
zzv$$(f8IMGf5FL*I{hz9$elZR7diQ7KDbrCyWD&=3D3^WWh3^WWh3^WWh3^WWh3^WWh
z47?))J{zJbr02<gm-GL6|8nXQ&-)GdHTVX&8C(c{ak1xpAAApJupRuCP05eI4}k$K
z@ZyI&?-}qk_$pWe7lZS_Ht;0-mo+dAUT3fK8h8xc0d|5@?5~~xkAp|S*TL7o&EO*N
z6#KhJz{B8vunvYG0|8hC2f=3Dk<H|T(mfh)je;6iW#I2W8|ulWXe1w03y1wR8n0FQz<
z*~^{=3DzXCr6KLJmG$H8~Oec;RB1lSEe1x~TQegr%Kz6tID1vn05Z$1Zhf=3Dj_A;A!^r
z_k%@nCD;z0W7GF6co0OO53U9;F;<U)hk(>EZU_6ojX=3Dh68~DFIN-gYNJTO0}c64U0
zyGAW0p}c9Kb5->HLgi+li++^!)KDjd8fJ!HqXeN=3Dy;A2zOL?ABVHT}vdok2&QJ~vg
zn_F2tGQV<SdB586$+zj1_O?-`qFm*=3DP-Uw8BvhI9!;vbCRl6D$qG{ld)T&m#%A;Ol
z-vps6I}A;eZj?lYzD9*Qwl{T6rbS&LU8du?lHOH)Qnw_b&eVY@Dc4)>>A4F0M3l$I
z5B=3D3xtAPp281r)@_MmNSirO)AMmsDIYWa{lnV4a&`erEo-=3DM@2WV_%j3=3DC%6r9%}J
z%FnTD&5xr{Rd2!6z2&icXZJ5H-^$qBIM3*u{n>pKbxjF_LapZ=3D7Tdd#j(JM1c0sR6
zAQx&{%7|gZD2LKQ#oEscWfHAmTjhOI(gZ@da}TapxkJS!7Yn~Ab7i_p`$2!))iT%J
zG8U_&f!~YJ#S;cFVNPNtBa5WS{4Ou?s5t8N3sh7-6qqD8u<oUq>CwHst!&Tvb1e!P
z1{w50${n#8PQ`jn$105cUSjeh3iiO@)j30k1z4=3DrTt)Mts+(i;%QVELF$>SC9oP2O
zIXh}_6mROCKHf~b)9y@YUl4LhdTJSaamInTo{Zt0r6w{E%o_Z&S7!FbC6%Xspt-uB
z;gDOQTW&f-m*wrNtjPCxJ91q`)heXduR$>~fxHQotDU#4%rES`rmfWc#%L&^G|GkI
zuC=3DynVz^@jy}esqOdKPO^1!U&=3DArdXC{GF^tm4X7Ja_#1m6A+gu=3DvyO?jxkB8+g$x
zPz!z##ZeJ4o^;ArN9_Av&h)b=3D^k(n|UEj{ACZV6vgR?d1w6ATeC6i1onQQ>_<fRqe
z#lj?DP`sln3$tpb)4BebI@DG-WPX_Ftfltrks6vTlpx9!{wzt3TS<dvoD)%WXBlOB
zAgS<x_+s8t{h~<sOivGo!*-;Lu6>^^>ju+69wYCosm1xj3u?z=3DQfPTuPR%D0BRZ3)
zI_&2s)q=3D?al33lfmOAKXgjeVK>vy-$ZtINLdL}y+pE+&QBfjFzNrKKK6K%b0<K=3DCa
z_YqK&7u)T&ccWzdGslTXfC%%JNn9|{DegV_JO~(^vAu1MF`N2KYo9PlOhJxLqk<f1
zbEYH?@mv#>G7xQ#G!bOcDn84YZHzC<mP{e>Lza_KWopSKY;eK67{$6J;lW7Tara#2
z<8dwS49o!Kp$=3DOz`?PFum{h{fkCk)_w;)MIGNfsyCAs*yO+hEeo-`<BdX%4HglM(i
zgUY<`$FcjWWYyDKXW+&%({aRBU4>OYkjW<2VUL(1EX84&LuG;^M<@8POs3>`ER??D
z(K3^oE|GvNCRr0$-6|F3(zcqs{8dxZRcS0YxG9>JHY?LkW;n<k5{K0@R_RLh7o4R8
z+^U57XiwF?%EwzfJduZa3sf>Bv5=3D4Qf@vcxZDdrlw#-9i+E%ETUoSMNjY)b>of2-E
zU38NW6S+wW9}jRw+5D=3D8{fI!40TW+b)jDB{VmcAeVa%9?>|`j34x+@@q9lk*240-W
zI-2C&_R*PRw6D5^-RRq*hS*Zl7vXJ_!iq@kXK}<0Au}K`p@i+qXiffD)gwczyi8NW
zTd;Q7!sI~0_lttGUSFx6jUQCW6E$@{9J4X-Pg)~z#k?F?Z$*oYs2vQQFguyVW-pAh
z)Cemv$tV}^(0E6{#L>;nK&@g&wZyYlxzQMAq;ct^T^*_1A84X8z*T53K^7$`kFz|K
z1;4g74MBpt#Un<P>CMi^Xz|w!OP{fdd08Vw49?i7nsYa;N%V?)*f6dlKP?PCEtdF<
zAjyy=3Dg7v%h(T<FoUlJghEG%R>N9st7((8uxgG&9Lp6bR?nu<|$e|Gs+=3DDDEnCs96d
z%ZodRVAR-H&q~soHE-_VUT50InyKp#aDG1%di-SMy5XV-6^B>^$@0eq(A#X3dB<5G
z4lwSB7|RxdUNJ0X5wXHhM9tLZ5vB8TiDl?Uv$kiG0kLYktj7VgagQA|S+$g7+HgvS
z2eOp6=3D>e71cCE`=3Dq*^BSeP&lyXj{!{i6d2`$nF5vRa$1Lq$tKt9u)~Yl3*B%dR0@c
zk+@*fb=3D@J)t1?ja@OrT_+UP!hTTcY6F!3)@Kn`IwoR>l0ttMUGLgGBS%80U@tyUB`
z5?c#tI<mnniJ(f7%pGAQSySeG8yIPS#F}O^TeUHs^Hu9=3Dm@8?bc3=3D6uTAveowRxNA
zZK)AUgVhm)ZJOO^Urlb-uB~j@5D=3DT5zx&%2t<T^69c6uRWF2N=3DoyJU}7MBjMEYICC
zdw6kq$&>y6e)j9%0<!;?FZn#lUjH7j51bEv$v*#K@DMlwj)SYfC&2Tr{R4I5N=3D5Ut
zVW454VW454VW454VW454VW454VW45)-7yg5$<+nr2{y|?8B;(SQ9h)!K=3DF%`5fv79
zY(>R@jWnlC)tQJCj(WXOwaaC1d~}97_uiZ%7&!p3l|bq!lZO%;Ti(J9qCo1RY(eem
z2?g_+>yA+YudW4t&htavjjR2-AIfQ;)KmuSkm;mP8LrHD)JR(HK1R_6A^-TIOe6c|
zswl?6lIFjE%Xx#81va+kQj+1!=3DPu2eZl7|UucR&}DuS3A0`;|wgQa4sOwt%zb=3DdZ#
zG(;_!x?<_-o3h!?G*|4gmncstesFxrVWB8UITL|Ge(;C4d}3*N>4pQ#v-=3Df>QHPX6
zGHE2YNnt3C2GpOW`YZjwP=3D6iW)wZ9?*QR61c+^(=3D%#gDI4i%&x*3+CXP-)8fNzFiP
z$*IdyTayQPL%*u0$*oj(#x*92!%|?3x+7a{<JNJ#vFe9sk1;2&N{6c>4AEZEA?`t`
NT0CW3*s-M~@4pQt_QC)F

literal 0
HcmV?d00001

diff --git a/0001-2.4.2.2-Define-all-MMIO-registers-as-little-endian.patch =
b/0001-2.4.2.2-Define-all-MMIO-registers-as-little-endian.patch
new file mode 100644
index 0000000..9a473b5
--- /dev/null
+++ b/0001-2.4.2.2-Define-all-MMIO-registers-as-little-endian.patch
@@ -0,0 +1,31 @@
+From ee871b9753dddfc5a41fd46eaf771669818f713e Mon Sep 17 00:00:00 2001
+From: Pawel Moll <pawel.moll@arm.com>
+Date: Tue, 27 Aug 2013 11:44:04 +0100
+Subject: [PATCH] 2.4.2.2 Define all MMIO registers as little endian
+
+Port of draft commit 88f37f9ec178b664213b77211fec03687b87958b.
+
+Signed-off-by: Pawel Moll <pawel.moll@arm.com>
+---
+ virtio-v1.0-wd01-part1-specification.txt | 5 +++--
+ 1 file changed, 3 insertions(+), 2 deletions(-)
+
+diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-p=
art1-specification.txt
+index a3ee054..3d69fd4 100644
+--- a/virtio-v1.0-wd01-part1-specification.txt
++++ b/virtio-v1.0-wd01-part1-specification.txt
+@@ -993,8 +993,9 @@ Virtual queue size is the number of elements in the qu=
eue,
+ therefore size of the descriptor table and both available and=20
+ used rings.
+=20
+-The endianness of the registers follows the native endianness of=20
+-the Guest. Writing to registers described as "R" and reading from=20
++All register values are organized as Little Endian.
++
++Writing to registers described as "R" and reading from
+ registers described as "W" is not permitted and can cause=20
+ undefined behavior.
+=20
+--=20
+1.8.1.2
+
diff --git a/id0.mbox b/id0.mbox
new file mode 100644
index 0000000..c42a3dc
--- /dev/null
+++ b/id0.mbox
@@ -0,0 +1,41 @@
+commit d8a995390a273bc6d209cf8a6f8a178b424a6438
+Author: Rusty Russell <rusty@au1.ibm.com>
+Date:   Tue Aug 20 16:13:19 2013 +0930
+
+    Reserve device ID 0 (zero) as invalid
+   =20
+    See http://tools.oasis-open.org/issues/browse/VIRTIO-7
+   =20
+    Signed-off-by: Rusty Russell <rusty@au1.ibm.com>
+
+diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-p=
art1-specification.txt
+index 8fc96b2..f989630 100644
+--- a/virtio-v1.0-wd01-part1-specification.txt
++++ b/virtio-v1.0-wd01-part1-specification.txt
+@@ -1069,6 +1069,8 @@ Discovering what devices are available and their typ=
e is bus-dependent.
+ | Device ID  |   Virtio Device    |
+ +------------+--------------------+
+ +------------+--------------------+
++| 0          |   none (ignore)    |
+++------------+--------------------+
+ | 1          |   network card     |
+ +------------+--------------------+
+ | 2          |   block device     |
+@@ -1094,6 +1096,8 @@ Discovering what devices are available and their typ=
e is bus-dependent.
+ | 12         |   virtio CAIF      |
+ +------------+--------------------+
+=20
++When a device is discovered with a device ID of 0, it should be ignored.
++
+ 2.5.1 Network Device
+ =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+=20
+
+
+---------------------------------------------------------------------
+To unsubscribe from this mail list, you must leave the OASIS TC that=20
+generates this mail.  Follow this link to all your TCs in OASIS at:
+https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php=20
+
+
+
diff --git a/virtio-v1.0-wd01-part1-specification.txt b/virtio-v1.0-wd01-pa=
rt1-specification.txt
index a3ee054..5d7280d 100644
--- a/virtio-v1.0-wd01-part1-specification.txt
+++ b/virtio-v1.0-wd01-part1-specification.txt
@@ -1076,6 +1076,8 @@ Discovering what devices are available and their type=
 is bus-dependent.
 | Device ID  |   Virtio Device    |
 +------------+--------------------+
 +------------+--------------------+
+| 0          |   none (ignore)    |
++------------+--------------------+
 | 1          |   network card     |
 +------------+--------------------+
 | 2          |   block device     |
@@ -1101,6 +1103,8 @@ Discovering what devices are available and their type=
 is bus-dependent.
 | 12         |   virtio CAIF      |
 +------------+--------------------+
=20
+When a device is discovered with a device ID of 0, it should be ignored.
+
 2.5.1 Network Device
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
=20
diff --git a/virtio-v1.0-wd01-part1-specification.txt.orig b/virtio-v1.0-wd=
01-part1-specification.txt.orig
new file mode 100644
index 0000000..a3ee054
--- /dev/null
+++ b/virtio-v1.0-wd01-part1-specification.txt.orig
@@ -0,0 +1,2890 @@
+1. INTRODUCTION
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+This document describes the specifications of the "virtio" family of
+devices. These are devices are found in virtual environments, yet by
+design they are not all that different from physical devices, and this
+document treats them as such. This allows the guest to use standard
+drivers and discovery mechanisms.
+
+The purpose of virtio and this specification is that virtual=20
+environments and guests should have a straightforward, efficient,=20
+standard and extensible mechanism for virtual devices, rather=20
+than boutique per-environment or per-OS mechanisms.
+
+  Straightforward: Virtio devices use normal bus mechanisms of
+  interrupts and DMA which should be familiar to any device driver
+  author. There is no exotic page-flipping or COW mechanism: it's just
+  a normal device.[1]
+
+  Efficient: Virtio devices consist of rings of descriptors=20
+  for input and output, which are neatly separated to avoid cache=20
+  effects from both guest and device writing to the same cache=20
+  lines.
+
+  Standard: Virtio makes no assumptions about the environment in which
+  it operates, beyond supporting the bus attaching the device.  Virtio
+  devices are implemented over PCI and other buses, and earlier drafts
+  been implemented on other buses not included in this spec.[2]
+
+  Extensible: Virtio PCI devices contain feature bits which are=20
+  acknowledged by the guest operating system during device setup.=20
+  This allows forwards and backwards compatibility: the device=20
+  offers all the features it knows about, and the driver=20
+  acknowledges those it understands and wishes to use.
+
+1.1.1.  Key words
+-----------------
+
+The key words must, must not, required, shall, shall not, should,
+should not, recommended, may, and optional are to be interpreted as
+described in [RFC 2119].  Note that for reasons of style, these words
+are not capitalized in this document.
+
+1.1.2.  Definitions
+-------------------
+
+term
+    Definition
+
+1.1.3.  Key concepts
+--------------------
+
+Guest
+    Definition...
+
+Host
+    Definition
+
+Device
+    Definition
+
+Driver
+    Definition
+
+1.2. Normative References
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
+
+[RFC 2119] S. Bradner, Key words for use in RFCs to Indicate Requirement L=
evels, http://www.ietf.org/rfc/rfc2119.txt IETF (Internet Engineering Task =
Force) RFC 2119, March 1997.
+
+1.3. Non-Normative References
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
+
+
+
+2 The Virtio Standard
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
+
+2.1 Basic Facilities of a Virtio Device
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+A virtio device is discovered and identified by a bus-specific method
+(see the bus specific sections *XREF*).  Each device consists of the follo=
wing
+parts:
+
+o Device Status field
+o Feature bits
+o Configuration space
+o One or more virtqueues
+
+2.1.1 Device Status Field
+-------------------------
+
+The Device Status field is updated by the guest to indicate its=20
+progress. This provides a simple low-level diagnostic: it's most=20
+useful to imagine them hooked up to traffic lights on the console=20
+indicating the status of each device.
+
+This field is 0 upon reset, otherwise at least one bit should be set:
+
+  ACKNOWLEDGE (1) Indicates that the guest OS has found the=20
+  device and recognized it as a valid virtio device.
+
+  DRIVER (2) Indicates that the guest OS knows how to drive the=20
+  device. Under Linux, drivers can be loadable modules so there=20
+  may be a significant (or infinite) delay before setting this=20
+  bit.
+
+  DRIVER_OK (4) Indicates that the driver is set up and ready to=20
+  drive the device.
+
+  FAILED (128) Indicates that something went wrong in the guest,=20
+  and it has given up on the device. This could be an internal=20
+  error, or the driver didn't like the device for some reason, or=20
+  even a fatal error during device operation. The device must be=20
+  reset before attempting to re-initialize.
+
+2.1.2 Feature Bits
+------------------
+
+Each virtio device lists all the features it understands.  During
+device initialization, the guest reads this and tells the device the
+subset that it understands.  The only way to renegotiate is to reset
+the device.
+
+This allows for forwards and backwards compatibility: if the device is
+enhanced with a new feature bit, older guests will not write that
+feature bit back to the device and it can go into backwards
+compatibility mode. Similarly, if a guest is enhanced with a feature
+that the device doesn't support, it see the new feature is not offered
+and can go into backwards compatibility mode (or, for poor
+implementations, set the FAILED Device Status bit).
+
+Feature bits are allocated as follows:
+
+  0 to 23: Feature bits for the specific device type
+
+  24 to 32: Feature bits reserved for extensions to the queue and=20
+  feature negotiation mechanisms
+
+For example, feature bit 0 for a network device (i.e. Subsystem=20
+Device ID 1) indicates that the device supports checksumming of=20
+packets.
+
+In particular, new fields in the device configuration space are=20
+indicated by offering a feature bit, so the guest can check=20
+before accessing that part of the configuration space.
+
+2.1.3 Configuration Space
+-------------------------
+
+Configuration space is generally used for rarely-changing or
+initialization-time parameters.
+
+Note that this space is generally the guest's native endian,=20
+rather than PCI's little-endian.
+
+2.1.4 Virtqueues
+----------------
+
+The mechanism for bulk data transport on virtio devices is
+pretentiously called a virtqueue. Each device can have zero or more
+virtqueues: for example, the simplest network device has one for
+transmit and one for receive.  Each queue has a 16-bit queue size
+parameter, which sets the number of entries and implies the total size
+of the queue.
+
+Each virtqueue occupies two or more physically-contiguous pages=20
+(usually defined as 4096 bytes, but depending on the transport)
+and consists of three parts:
+
++-------------------+-----------------------------------+-----------+
+| Descriptor Table  |   Available Ring     (padding)    | Used Ring |
++-------------------+-----------------------------------+-----------+
+
+The bus-specific Queue Size field controls the total number of bytes
+required for the virtqueue according to the following formula:
+
+=09#define ALIGN(x) (((x) + PAGE_SIZE) & ~PAGE_SIZE)
+=09static inline unsigned vring_size(unsigned int qsz)
+=09{
+=09     return ALIGN(sizeof(struct vring_desc)*qsz + sizeof(u16)*(3 + qsz)=
)
+=09          + ALIGN(sizeof(u16)*3 + sizeof(struct vring_used_elem)*qsz);
+=09}
+
+This currently wastes some space with padding, but also allows future
+extensions such as the VIRTIO_RING_F_EVENT_IDX extension.  The
+virtqueue layout structure looks like this:
+
+=09struct vring {
+=09=09// The actual descriptors (16 bytes each)
+=09=09struct vring_desc desc[ Queue Size ];
+=09
+=09=09// A ring of available descriptor heads with free-running index.
+=09=09struct vring_avail avail;
+=09
+=09=09// Padding to the next PAGE_SIZE boundary.
+=09=09char pad[ Padding ];
+
+=09=09// A ring of used descriptor heads with free-running index.
+=09=09struct vring_used used;
+=09};
+
+When the driver wants to send a buffer to the device, it fills in=20
+a slot in the descriptor table (or chains several together), and=20
+writes the descriptor index into the available ring.  It then=20
+notifies the device. When the device has finished a buffer, it=20
+writes the descriptor into the used ring, and sends an interrupt.
+
+2.1.4.1 A Note on Virtqueue Endianness
+--------------------------------------
+
+Note that the endian of fields and in the virtqueue is the native
+endian of the guest, not little-endian as PCI normally is. This makes
+for simpler guest code, and it is assumed that the host already has to
+be deeply aware of the guest endian so such an "endian-aware" device
+is not a significant issue.
+
+2.1.4.2 Message Framing
+-----------------------
+The original intent of the specification was that message framing (the
+particular layout of descriptors) be independent of the contents of
+the buffers. For example, a network transmit buffer consists of a 12
+byte header followed by the network packet. This could be most simply
+placed in the descriptor table as a 12 byte output descriptor followed
+by a 1514 byte output descriptor, but it could also consist of a
+single 1526 byte output descriptor in the case where the header and
+packet are adjacent, or even three or more descriptors (possibly with
+loss of efficiency in that case).
+
+Regrettably, initial driver implementations used simple layouts, and
+devices came to rely on it, despite this specification wording[10]. It
+is thus recommended that drivers be conservative in their assumptions,
+unless the VIRTIO_F_ANY_LAYOUT feature is accepted. In addition, some
+implementations may have large-but-reasonable restrictions on total
+descriptor size (such as based on IOV_MAX in the host OS). This has
+not been a problem in practice: little sympathy will be given to
+drivers which create unreasonably-sized descriptors such as by
+dividing a network packet into 1500 single-byte descriptors!
+
+2.1.4.3 The Virtqueue Descriptor Table
+--------------------------------------
+
+The descriptor table refers to the buffers the guest is using for=20
+the device. The addresses are physical addresses, and the buffers=20
+can be chained via the next field. Each descriptor describes a=20
+buffer which is read-only or write-only, but a chain of=20
+descriptors can contain both read-only and write-only buffers.
+
+No descriptor chain may be more than 2^32 bytes long in total.
+
+=09struct vring_desc {
+=09=09/* Address (guest-physical). */
+=09=09u64 addr;
+=09=09/* Length. */
+=09=09u32 len;
+=09
+=09/* This marks a buffer as continuing via the next field. */
+=09#define VRING_DESC_F_NEXT   1
+=09/* This marks a buffer as write-only (otherwise read-only). */
+=09#define VRING_DESC_F_WRITE     2
+=09/* This means the buffer contains a list of buffer descriptors. */
+=09#define VRING_DESC_F_INDIRECT   4=20
+=09=09/* The flags as indicated above. */
+=09=09u16 flags;
+=09=09/* Next field if flags & NEXT */
+=09=09u16 next;
+=09};
+
+The number of descriptors in the table is defined by the queue size
+for this virtqueue.
+
+2.1.4.3.1 Indirect Descriptors
+------------------------------
+
+Some devices benefit by concurrently dispatching a large number=20
+of large requests. The VIRTIO_RING_F_INDIRECT_DESC feature can be=20
+used to allow this (see FIXME: Reserved Feature Bits). To increase=20
+ring capacity it is possible to store a table of indirect=20
+descriptors anywhere in memory, and insert a descriptor in main=20
+virtqueue (with flags&VRING_DESC_F_INDIRECT on) that refers to memory buff=
er=20
+containing this indirect descriptor table; fields addr and len=20
+refer to the indirect table address and length in bytes,=20
+respectively. The indirect table layout structure looks like this=20
+(len is the length of the descriptor that refers to this table,=20
+which is a variable, so this code won't compile):
+
+=09struct indirect_descriptor_table {
+=09=09/* The actual descriptors (16 bytes each) */
+=09=09struct vring_desc desc[len / 16];
+=09};
+
+The first indirect descriptor is located at start of the indirect=20
+descriptor table (index 0), additional indirect descriptors are=20
+chained by next field. An indirect descriptor without next field=20
+(with flags&VRING_DESC_F_NEXT off) signals the end of the indirect descrip=
tor=20
+table, and transfers control back to the main virtqueue. An=20
+indirect descriptor can not refer to another indirect descriptor=20
+table (flags&VRING_DESC_F_INDIRECT must be off). A single indirect descrip=
tor=20
+table can include both read-only and write-only descriptors;=20
+write-only flag (flags&VRING_DESC_F_WRITE) in the descriptor that refers t=
o it=20
+is ignored.
+
+2.1.4.4 The Virtqueue Available Ring
+------------------------------------
+
+The available ring refers to what descriptor chains we are offering the
+device: each entry refers to the head of a descriptor chain. The "flags" f=
ield
+is currently 0 or 1: 1 indicating that we do not need an interrupt
+when the device consumes a descriptor chain from the available
+ring. Alternatively, the guest can ask the device to delay interrupts
+until an entry with an index specified by the "used_event" field is
+written in the used ring (equivalently, until the idx field in the
+used ring will reach the value used_event + 1). The method employed by
+the device is controlled by the VIRTIO_RING_F_EVENT_IDX feature bit
+(see FIXME: Reserved Feature Bits). This interrupt suppression is
+merely an optimization; it may not suppress interrupts entirely.
+
+The "idx" field indicates where we would put the next descriptor=20
+entry (modulo the queue size). This starts at 0, and increases.
+
+=09struct vring_avail {
+=09#define VRING_AVAIL_F_NO_INTERRUPT      1
+=09=09u16 flags;
+=09=09u16 idx;
+=09=09u16 ring[ /* Queue Size */ ];
+=09=09u16 used_event;=09/* Only if VIRTIO_RING_F_EVENT_IDX */
+=09};=20
+
+2.1.4.5 The Virtqueue Used Ring
+-------------------------------
+
+The used ring is where the device returns buffers once it is done=20
+with them. The flags field can be used by the device to hint that=20
+no notification is necessary when the guest adds to the available=20
+ring. Alternatively, the "avail_event" field can be used by the=20
+device to hint that no notification is necessary until an entry=20
+with an index specified by the "avail_event" is written in the=20
+available ring (equivalently, until the idx field in the=20
+available ring will reach the value avail_event + 1). The method=20
+employed by the device is controlled by the guest through the=20
+VIRTIO_RING_F_EVENT_IDX feature bit (see FIXME: Reserved
+Feature Bits).[7]
+
+Each entry in the ring is a pair: the head entry of the=20
+descriptor chain describing the buffer (this matches an entry=20
+placed in the available ring by the guest earlier), and the total=20
+of bytes written into the buffer. The latter is extremely useful=20
+for guests using untrusted buffers: if you do not know exactly=20
+how much has been written by the device, you usually have to zero=20
+the buffer to ensure no data leakage occurs.
+
+=09/* u32 is used here for ids for padding reasons. */
+=09struct vring_used_elem {
+=09=09/* Index of start of used descriptor chain. */
+=09=09u32 id;
+=09=09/* Total length of the descriptor chain which was used (written to) =
*/
+=09=09u32 len;
+=09};
+
+=09struct vring_used {
+=09#define VRING_USED_F_NO_NOTIFY  1=20
+=09=09u16 flags;
+=09=09u16 idx;
+=09=09struct vring_used_elem ring[ /* Queue Size */];
+=09=09u16 avail_event; /* Only if VIRTIO_RING_F_EVENT_IDX */
+=09};
+
+2.1.4.6 Helpers for Operating Virtqueues
+----------------------------------------
+
+The Linux Kernel Source code contains the definitions above and=20
+helper routines in a more usable form, in=20
+include/linux/virtio_ring.h. This was explicitly licensed by IBM=20
+and Red Hat under the (3-clause) BSD license so that it can be=20
+freely used by all other projects, and is reproduced (with slight=20
+variation to remove Linux assumptions) in *XREF*.
+
+2.2 General Initialization And Device Operation
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+We start with an overview of device initialization, then expand on the
+details of the device and how each step is preformed.  This section
+should be read along with the bus-specific section which describes
+how to communicate with the specific device.
+
+2.2.1 Device Initialization
+---------------------------
+
+1. Reset the device. This is not required on initial start up.
+
+2. The ACKNOWLEDGE status bit is set: we have noticed the device.
+
+3. The DRIVER status bit is set: we know how to drive the device.
+
+4. Device-specific setup, including reading the device feature=20
+  bits, discovery of virtqueues for the device, optional per-bus
+  setup, and reading and possibly writing the device's virtio=20
+  configuration space.
+
+5. The subset of device feature bits understood by the driver is=20
+   written to the device.
+
+6. The DRIVER_OK status bit is set.
+
+7. The device can now be used (ie. buffers added to the=20
+   virtqueues)[4]
+
+If any of these steps go irrecoverably wrong, the guest should=20
+set the FAILED status bit to indicate that it has given up on the=20
+device (it can reset the device later to restart if desired).
+
+2.2.2 Device Operation
+----------------------
+
+There are two parts to device operation: supplying new buffers to=20
+the device, and processing used buffers from the device. As an=20
+example, the simplest virtio network device has two virtqueues: the=20
+transmit virtqueue and the receive virtqueue. The driver adds=20
+outgoing (read-only) packets to the transmit virtqueue, and then=20
+frees them after they are used. Similarly, incoming (write-only)=20
+buffers are added to the receive virtqueue, and processed after=20
+they are used.
+
+2.2.2.1 Supplying Buffers to The Device
+---------------------------------------
+
+Actual transfer of buffers from the guest OS to the device=20
+operates as follows:
+
+1. Place the buffer(s) into free descriptor(s).
+
+  (a) If there are no free descriptors, the guest may choose to=20
+    notify the device even if notifications are suppressed (to=20
+    reduce latency).[8]
+
+2. Place the id of the buffer in the next ring entry of the=20
+  available ring.
+
+3. The steps (1) and (2) may be performed repeatedly if batching=20
+  is possible.
+
+4. A memory barrier should be executed to ensure the device sees=20
+  the updated descriptor table and available ring before the next=20
+  step.
+
+5. The available "idx" field should be increased by the number of=20
+  entries added to the available ring.
+
+6. A memory barrier should be executed to ensure that we update=20
+  the idx field before checking for notification suppression.
+
+7. If notifications are not suppressed, the device should be=20
+  notified of the new buffers.
+
+Note that the above code does not take precautions against the=20
+available ring buffer wrapping around: this is not possible since=20
+the ring buffer is the same size as the descriptor table, so step=20
+(1) will prevent such a condition.
+
+In addition, the maximum queue size is 32768 (it must be a power=20
+of 2 which fits in 16 bits), so the 16-bit "idx" value can always=20
+distinguish between a full and empty buffer.
+
+Here is a description of each stage in more detail.
+
+2.2.2.1.1 Placing Buffers Into The Descriptor Table
+---------------------------------------------------
+
+A buffer consists of zero or more read-only physically-contiguous=20
+elements followed by zero or more physically-contiguous=20
+write-only elements (it must have at least one element). This=20
+algorithm maps it into the descriptor table:
+
+for each buffer element, b:
+
+  (a) Get the next free descriptor table entry, d
+
+  (b) Set d.addr to the physical address of the start of b
+
+  (c) Set d.len to the length of b.
+
+  (d) If b is write-only, set d.flags to VRING_DESC_F_WRITE,=20
+    otherwise 0.
+
+  (e) If there is a buffer element after this:
+
+    i. Set d.next to the index of the next free descriptor=20
+      element.
+
+    ii. Set the VRING_DESC_F_NEXT bit in d.flags.
+
+In practice, the d.next fields are usually used to chain free=20
+descriptors, and a separate count kept to check there are enough=20
+free descriptors before beginning the mappings.
+
+2.2.2.1.2 Updating The Available Ring
+-------------------------------------
+
+The head of the buffer we mapped is the first d in the algorithm=20
+above. A naive implementation would do the following:
+
+=09avail->ring[avail->idx % qsz] =3D head;
+
+However, in general we can add many descriptor chains before we update=20
+the "idx" field (at which point they become visible to the=20
+device), so we keep a counter of how many we've added:
+
+=09avail->ring[(avail->idx + added++) % qsz] =3D head;
+
+2.2.2.1.3 Updating The Index Field
+----------------------------------
+
+Once the index field of the virtqueue is updated, the device will=20
+be able to access the descriptor chains we've created and the=20
+memory they refer to. This is why a memory barrier is generally=20
+used before the index update, to ensure it sees the most up-to-date=20
+copy.
+
+The index field always increments, and we let it wrap naturally at=20
+65536:
+
+=09avail->idx +=3D added;
+
+2.2.2.1.4 Notifying The Device
+------------------------------
+
+The actual method of device notification is bus-specific, but generally
+it can be expensive.  So the device can suppress such notifications if it=
=20
+doesn't need them.  We have to be careful to expose the new index
+value before checking if notifications are suppressed: it's OK to notify=
=20
+gratuitously, but not to omit a required notification. So again,=20
+we use a memory barrier here before reading the flags or the=20
+avail_event field.
+
+If the VIRTIO_F_RING_EVENT_IDX feature is not negotiated, and if the
+VRING_USED_F_NOTIFY flag is not set, we go ahead and notify the
+device.
+
+If the VIRTIO_F_RING_EVENT_IDX feature is negotiated, we read the=20
+avail_event field in the available ring structure. If the=20
+available index crossed_the avail_event field value since the=20
+last notification, we go ahead and write to the PCI configuration=20
+space.  The avail_event field wraps naturally at 65536 as well,
+iving the following algorithm for calculating whether a device needs
+notification:
+
+=09(u16)(new_idx - avail_event - 1) < (u16)(new_idx - old_idx)
+
+2.2.2.2 Receiving Used Buffers From The Device
+----------------------------------------------
+
+Once the device has used a buffer (read from or written to it, or=20
+parts of both, depending on the nature of the virtqueue and the=20
+device), it sends an interrupt, following an algorithm very=20
+similar to the algorithm used for the driver to send the device a=20
+buffer:
+
+1. Write the head descriptor number to the next field in the used=20
+  ring.
+
+2. Update the used ring index.
+
+3. Deliver an interrupt if necessary:
+
+  (a) If the VIRTIO_F_RING_EVENT_IDX feature is not negotiated:=20
+    check if the VRING_AVAIL_F_NO_INTERRUPT flag is not set in=20
+    avail->flags.
+
+  (b) If the VIRTIO_F_RING_EVENT_IDX feature is negotiated: check=20
+    whether the used index crossed the used_event field value=20
+    since the last update. The used_event field wraps naturally=20
+    at 65536 as well:
+=09(u16)(new_idx - used_event - 1) < (u16)(new_idx - old_idx)
+
+For each ring, guest should then disable interrupts by writing=20
+VRING_AVAIL_F_NO_INTERRUPT flag in avail structure, if required.=20
+It can then process used ring entries finally enabling interrupts=20
+by clearing the VRING_AVAIL_F_NO_INTERRUPT flag or updating the=20
+EVENT_IDX field in the available structure.  The guest should then=20
+execute a memory barrier, and then recheck the ring empty=20
+condition. This is necessary to handle the case where after the=20
+last check and before enabling interrupts, an interrupt has been=20
+suppressed by the device:
+
+=09vring_disable_interrupts(vq);
+=09
+=09for (;;) {
+=09=09if (vq->last_seen_used !=3D vring->used.idx) {
+=09=09=09vring_enable_interrupts(vq);
+=09=09=09mb();
+=09
+=09=09=09if (vq->last_seen_used !=3D vring->used.idx)
+=09=09=09=09break;
+=09=09}
+
+=09=09struct vring_used_elem *e =3D vring.used->ring[vq->last_seen_used%vs=
z];
+=09=09process_buffer(e);
+=09=09vq->last_seen_used++;
+=09}
+
+2.2.2.3 Notification of Device Configuration Changes
+----------------------------------------------------
+
+For devices where the configuration information can be changed, an
+interrupt is delivered when a configuration change occurs.
+
+
+
+2.4 Virtio Transport Options
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D
+
+Virtio can use various different busses, thus the standard is split
+into virtio general and bus-specific sections.
+
+2.4.1 Virtio Over PCI Bus
+-------------------------
+
+Virtio devices are commonly implemented as PCI devices.
+
+2.4.1.1 PCI Device Discovery
+----------------------------
+
+Any PCI device with Vendor ID 0x1AF4, and Device ID 0x1000 through
+0x103F inclusive is a virtio device[3]. The device must also have a
+Revision ID of 0 to match this specification.
+
+The Subsystem Device ID indicates which virtio device is=20
+supported by the device. The Subsystem Vendor ID should reflect=20
+the PCI Vendor ID of the environment (it's currently only used=20
+for informational purposes by the guest).
+
+2.4.1.2 PCI Device Layout
+-------------------------
+
+To configure the device, we use the first I/O region of the PCI=20
+device. This contains a virtio header followed by a=20
+device-specific region.
+
+There may be different widths of accesses to the I/O region; the
+"natural" access method for each field in the virtio header must be
+used (i.e. 32-bit accesses for 32-bit fields, etc), but the
+device-specific region can be accessed using any width accesses, and
+should obtain the same results.
+
+Note that this is possible because while the virtio header is PCI=20
+(i.e. little) endian, the device-specific region is encoded in=20
+the native endian of the guest (where such distinction is=20
+applicable).
+
+2.4.1.2.1 PCI Device Virtio Header
+----------------------------------
+
+The virtio header looks as follows:
+
++------------++---------------------+---------------------+----------+----=
----+---------+---------+---------+--------+
+| Bits       || 32                  | 32                  | 32       | 16 =
    | 16      | 16      | 8       | 8      |
++------------++---------------------+---------------------+----------+----=
----+---------+---------+---------+--------+
+| Read/Write || R                   | R+W                 | R+W      | R  =
    | R+W     | R+W     | R+W     | R      |
++------------++---------------------+---------------------+----------+----=
----+---------+---------+---------+--------+
+| Purpose    || Device              | Guest               | Queue    | Que=
ue  | Queue   | Queue   | Device  | ISR    |
+|            || Features bits 0:31  | Features bits 0:31  | Address  | Siz=
e   | Select  | Notify  | Status  | Status |
++------------++---------------------+---------------------+----------+----=
----+---------+---------+---------+--------+
+
+
+If MSI-X is enabled for the device, two additional fields=20
+immediately follow this header:[5]
+
+
++------------++----------------+--------+
+| Bits       || 16             | 16     |
+              +----------------+--------+
++------------++----------------+--------+
+| Read/Write || R+W            | R+W    |
++------------++----------------+--------+
+| Purpose    || Configuration  | Queue  |
+| (MSI-X)    || Vector         | Vector |
++------------++----------------+--------+
+
+Immediately following these general headers, there may be=20
+device-specific headers:
+
++------------++--------------------+
+| Bits       || Device Specific    |
+              +--------------------+
++------------++--------------------+
+| Read/Write || Device Specific    |
++------------++--------------------+
+| Purpose    || Device Specific... |
+|            ||                    |
++------------++--------------------+
+
+2.4.1.3 PCI-specific Initialization And Device Operation
+--------------------------------------------------------
+
+The page size for a virtqueue on a PCI virtio device is defined as
+4096 bytes.
+
+2.4.1.3.1 Device Initialization
+-------------------------------
+
+2.4.1.3.1.1 Queue Vector Configuration
+--------------------------------------
+
+When MSI-X capability is present and enabled in the device=20
+(through standard PCI configuration space) 4 bytes at byte offset=20
+20 are used to map configuration change and queue interrupts to=20
+MSI-X vectors. In this case, the ISR Status field is unused, and=20
+device specific configuration starts at byte offset 24 in virtio=20
+header structure. When MSI-X capability is not enabled, device=20
+specific configuration starts at byte offset 20 in virtio header.
+
+Writing a valid MSI-X Table entry number, 0 to 0x7FF, to one of=20
+Configuration/Queue Vector registers, maps interrupts triggered=20
+by the configuration change/selected queue events respectively to=20
+the corresponding MSI-X vector. To disable interrupts for a=20
+specific event type, unmap it by writing a special NO_VECTOR=20
+value:
+
+=09/* Vector value used to disable MSI for queue */
+=09#define VIRTIO_MSI_NO_VECTOR            0xffff=20
+
+Reading these registers returns vector mapped to a given event,=20
+or NO_VECTOR if unmapped. All queue and configuration change=20
+events are unmapped by default.
+
+Note that mapping an event to vector might require allocating=20
+internal device resources, and might fail. Devices report such=20
+failures by returning the NO_VECTOR value when the relevant=20
+Vector field is read. After mapping an event to vector, the=20
+driver must verify success by reading the Vector field value: on=20
+success, the previously written value is returned, and on=20
+failure, NO_VECTOR is returned. If a mapping failure is detected,=20
+the driver can retry mapping with fewervectors, or disable MSI-X.
+
+2.4.1.3.1.2 Virtqueue Configuration
+-----------------------------------
+
+As a device can have zero or more virtqueues for bulk data=20
+transport (for example, the simplest network device has two), the driver=
=20
+needs to configure them as part of the device-specific=20
+configuration.
+
+This is done as follows, for each virtqueue a device has:
+
+1. Write the virtqueue index (first queue is 0) to the Queue=20
+  Select field.
+
+2. Read the virtqueue size from the Queue Size field, which is=20
+  always a power of 2. This controls how big the virtqueue is=20
+  (see 2.1.4 Virtqueues). If this field is 0, the virtqueue does not exist=
.=20
+
+3. Allocate and zero virtqueue in contiguous physical memory, on=20
+  a 4096 byte alignment. Write the physical address, divided by=20
+  4096 to the Queue Address field.[6]
+
+4. Optionally, if MSI-X capability is present and enabled on the=20
+  device, select a vector to use to request interrupts triggered=20
+  by virtqueue events. Write the MSI-X Table entry number=20
+  corresponding to this vector in Queue Vector field. Read the=20
+  Queue Vector field: on success, previously written value is=20
+  returned; on failure, NO_VECTOR value is returned.
+
+2.4.1.3.2 Notifying The Device
+------------------------------
+
+Device notification occurs by writing the 16-bit virtqueue index=20
+of this virtqueue to the Queue Notify field of the virtio header=20
+in the first I/O region of the PCI device.
+
+2.4.1.3.3 Receiving Used Buffers From The Device
+
+If an interrupt is necessary:
+
+  (a) If MSI-X capability is disabled:
+
+    i. Set the lower bit of the ISR Status field for the device.
+
+    ii. Send the appropriate PCI interrupt for the device.
+
+  (b) If MSI-X capability is enabled:
+
+    i. Request the appropriate MSI-X interrupt message for the=20
+      device, Queue Vector field sets the MSI-X Table entry=20
+      number.
+
+    ii. If Queue Vector field value is NO_VECTOR, no interrupt=20
+      message is requested for this event.
+
+The guest interrupt handler should:
+
+1. If MSI-X capability is disabled: read the ISR Status field,=20
+  which will reset it to zero. If the lower bit is zero, the=20
+  interrupt was not for this device. Otherwise, the guest driver=20
+  should look through the used rings of each virtqueue for the=20
+  device, to see if any progress has been made by the device=20
+  which requires servicing.
+
+2. If MSI-X capability is enabled: look through the used rings of=20
+  each virtqueue mapped to the specific MSI-X vector for the=20
+  device, to see if any progress has been made by the device=20
+  which requires servicing.
+
+2.4.1.3.4 Notification of Device Configuration Changes
+------------------------------------------------------
+
+Some virtio PCI devices can change the device configuration=20
+state, as reflected in the virtio header in the PCI configuration=20
+space. In this case:
+
+1. If MSI-X capability is disabled: an interrupt is delivered and=20
+  the second highest bit is set in the ISR Status field to=20
+  indicate that the driver should re-examine the configuration=20
+  space.  Note that a single interrupt can indicate both that one=20
+  or more virtqueue has been used and that the configuration=20
+  space has changed: even if the config bit is set, virtqueues=20
+  must be scanned.
+
+2. If MSI-X capability is enabled: an interrupt message is=20
+  requested. The Configuration Vector field sets the MSI-X Table=20
+  entry number to use. If Configuration Vector field value is=20
+  NO_VECTOR, no interrupt message is requested for this event.
+
+2.4.2 Virtio Over MMIO
+----------------------
+
+Virtual environments without PCI support (a common situation in=20
+embedded devices models) might use simple memory mapped device ("
+virtio-mmio") instead of the PCI device.
+
+The memory mapped virtio device behaviour is based on the PCI=20
+device specification. Therefore most of operations like device=20
+initialization, queues configuration and buffer transfers are=20
+nearly identical. Existing differences are described in the=20
+following sections.
+
+2.4.2.1 MMIO Device Discovery
+-----------------------------
+
+Unlike PCI, MMIO provides no generic device discovery.  For systems using
+a device-tree such as Linux's dtc or Open Firmware, the suggested format i=
s:
+
+=09virtio_block@1e000 {
+=09=09compatible =3D "virtio,mmio";
+=09=09reg =3D <0x1e000 0x100>;
+=09=09interrupts =3D <42>;
+=09}
+
+2.4.2.2 MMIO Device Layout
+--------------------------
+
+MMIO virtio devices provides a set of memory mapped control=20
+registers, all 32 bits wide, followed by device-specific=20
+configuration space. The following list presents their layout:
+
+=E2=80=A2 Offset from the device base address | Direction | Name=20
+ Description=20
+
+=E2=80=A2 0x000 | R | MagicValue=20
+ "virt" string.=20
+
+=E2=80=A2 0x004 | R | Version=20
+ Device version number. Currently must be 1.=20
+
+=E2=80=A2 0x008 | R | DeviceID=20
+ Virtio Subsystem Device ID (ie. 1 for network card).=20
+
+=E2=80=A2 0x00c | R | VendorID=20
+ Virtio Subsystem Vendor ID.=20
+
+=E2=80=A2 0x010 | R | HostFeatures=20
+ Flags representing features the device supports.
+ Reading from this register returns 32 consecutive flag bits,=20
+  first bit depending on the last value written to=20
+  HostFeaturesSel register. Access to this register returns bits HostFeatu=
resSel*32
+
+   to (HostFeaturesSel*32)+31, eg. feature bits 0 to 31 if=20
+  HostFeaturesSel is set to 0 and features bits 32 to 63 if=20
+  HostFeaturesSel is set to 1. Also see [sub:Feature-Bits]
+
+=E2=80=A2 0x014 | W | HostFeaturesSel=20
+ Device (Host) features word selection.
+ Writing to this register selects a set of 32 device feature bits=20
+  accessible by reading from HostFeatures register. Device driver=20
+  must write a value to the HostFeaturesSel register before=20
+  reading from the HostFeatures register.=20
+
+=E2=80=A2 0x020 | W | GuestFeatures=20
+ Flags representing device features understood and activated by=20
+  the driver.
+ Writing to this register sets 32 consecutive flag bits, first=20
+  bit depending on the last value written to GuestFeaturesSel=20
+  register. Access to this register sets bits GuestFeaturesSel*32
+  to (GuestFeaturesSel*32)+31, eg. feature bits 0 to 31 if=20
+  GuestFeaturesSel is set to 0 and features bits 32 to 63 if=20
+  GuestFeaturesSel is set to 1. Also see [sub:Feature-Bits]
+
+=E2=80=A2 0x024 | W | GuestFeaturesSel=20
+ Activated (Guest) features word selection.
+ Writing to this register selects a set of 32 activated feature=20
+  bits accessible by writing to the GuestFeatures register.=20
+  Device driver must write a value to the GuestFeaturesSel=20
+  register before writing to the GuestFeatures register.=20
+
+=E2=80=A2 0x028 | W | GuestPageSize=20
+ Guest page size.
+ Device driver must write the guest page size in bytes to the=20
+  register during initialization, before any queues are used.=20
+  This value must be a power of 2 and is used by the Host to=20
+  calculate Guest address of the first queue page (see QueuePFN).=20
+
+=E2=80=A2 0x030 | W | QueueSel=20
+ Virtual queue index (first queue is 0).
+ Writing to this register selects the virtual queue that the=20
+  following operations on QueueNum, QueueAlign and QueuePFN apply=20
+  to.=20
+
+=E2=80=A2 0x034 | R | QueueNumMax=20
+ Maximum virtual queue size.=20
+ Reading from the register returns the maximum size of the queue=20
+  the Host is ready to process or zero (0x0) if the queue is not=20
+  available. This applies to the queue selected by writing to=20
+  QueueSel and is allowed only when QueuePFN is set to zero=20
+  (0x0), so when the queue is not actively used.=20
+
+=E2=80=A2 0x038 | W | QueueNum=20
+ Virtual queue size.
+ Queue size is the number of elements in the queue, therefore size=20
+  of the descriptor table and both available and used rings.
+ Writing to this register notifies the Host what size of the=20
+  queue the Guest will use. This applies to the queue selected by=20
+  writing to QueueSel.=20
+
+=E2=80=A2 0x03c | W | QueueAlign=20
+ Used Ring alignment in the virtual queue.
+ Writing to this register notifies the Host about alignment=20
+  boundary of the Used Ring in bytes. This value must be a power=20
+  of 2 and applies to the queue selected by writing to QueueSel.=20
+
+=E2=80=A2 0x040 | RW | QueuePFN=20
+ Guest physical page number of the virtual queue.
+ Writing to this register notifies the host about location of the=20
+  virtual queue in the Guest's physical address space. This value=20
+  is the index number of a page starting with the queue=20
+  Descriptor Table. Value zero (0x0) means physical address zero=20
+  (0x00000000) and is illegal. When the Guest stops using the=20
+  queue it must write zero (0x0) to this register.
+ Reading from this register returns the currently used page=20
+  number of the queue, therefore a value other than zero (0x0)=20
+  means that the queue is in use.
+ Both read and write accesses apply to the queue selected by=20
+  writing to QueueSel.=20
+
+=E2=80=A2 0x050 | W | QueueNotify=20
+ Queue notifier.
+ Writing a queue index to this register notifies the Host that=20
+  there are new buffers to process in the queue.=20
+
+=E2=80=A2 0x60 | R | InterruptStatus
+Interrupt status.
+Reading from this register returns a bit mask of interrupts=20
+  asserted by the device. An interrupt is asserted if the=20
+  corresponding bit is set, ie. equals one (1).
+
+  - Bit 0 | Used Ring Update
+=09This interrupt is asserted when the Host has updated the Used=20
+    Ring in at least one of the active virtual queues.
+
+  - Bit 1 | Configuration change
+=09This interrupt is asserted when configuration of the device has=20
+    changed.
+
+=E2=80=A2 0x064 | W | InterruptACK=20
+ Interrupt acknowledge.=20
+ Writing to this register notifies the Host that the Guest=20
+  finished handling interrupts. Set bits in the value clear the=20
+  corresponding bits of the InterruptStatus register.=20
+
+=E2=80=A2 0x070 | RW | Status=20
+ Device status.=20
+ Reading from this register returns the current device status=20
+  flags.=20
+ Writing non-zero values to this register sets the status flags,=20
+  indicating the Guest progress. Writing zero (0x0) to this=20
+  register triggers a device reset.=20
+ Also see [sub:Device-Initialization-Sequence]
+
+=E2=80=A2 0x100+ | RW | Config=20
+ Device-specific configuration space starts at an offset 0x100=20
+  and is accessed with byte alignment. Its meaning and size=20
+  depends on the device and the driver.=20
+
+Virtual queue size is the number of elements in the queue,=20
+therefore size of the descriptor table and both available and=20
+used rings.
+
+The endianness of the registers follows the native endianness of=20
+the Guest. Writing to registers described as "R" and reading from=20
+registers described as "W" is not permitted and can cause=20
+undefined behavior.
+
+2.4.2.3 MMIO-specific Initialization And Device Operation
+---------------------------------------------------------
+
+2.4.2.3.1 Device Initialization
+-------------------------------
+
+Unlike the fixed page size for PCI, the virtqueue page size is defined
+by the GuestPageSize field, as written by the guest.  This must be
+done before the virtqueues are configured.
+
+2.4.2.3.1.1 Virtqueue Configuration
+-----------------------------------
+
+1. Select the queue writing its index (first queue is 0) to the=20
+  QueueSel register.=20
+
+2. Check if the queue is not already in use: read QueuePFN=20
+  register, returned value should be zero (0x0).=20
+
+3. Read maximum queue size (number of elements) from the=20
+  QueueNumMax register. If the returned value is zero (0x0) the=20
+  queue is not available.=20
+
+4. Allocate and zero the queue pages in contiguous virtual=20
+  memory, aligning the Used Ring to an optimal boundary (usually=20
+  page size). Size of the allocated queue may be smaller than or=20
+  equal to the maximum size returned by the Host.=20
+
+5. Notify the Host about the queue size by writing the size to=20
+  QueueNum register.=20
+
+6. Notify the Host about the used alignment by writing its value=20
+  in bytes to QueueAlign register.=20
+
+7. Write the physical number of the first page of the queue to=20
+  the QueuePFN register.=20
+
+2.4.2.3.2 Notifying The Device
+------------------------------
+
+The device is notified about new buffers available in a queue by
+writing the queue index to register QueueNum.
+
+2.4.2.3.3 Receiving Used Buffers From The Device
+------------------------------------------------
+
+The memory mapped virtio device is using single, dedicated=20
+interrupt signal, which is raised when at least one of the=20
+interrupts described in the InterruptStatus register=20
+description is asserted. After receiving an interrupt, the=20
+driver must read the InterruptStatus register to check what=20
+caused the interrupt (see the register description). After the=20
+interrupt is handled, the driver must acknowledge it by writing=20
+a bit mask corresponding to the serviced interrupt to the=20
+InterruptACK register.
+
+2.4.2.4.4 Notification of Device Configuration Changes
+------------------------------------------------------
+
+This is indicated by bit 1 in the InterruptStatus register, as
+documented in the register description.
+
+2.5 Device Types
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+On top of the queues, config space and feature negotiation facilities
+built into virtio, several specific devices are defined.
+
+The following device IDs are used to identify different types of virtio
+devices.  Some device IDs are reserved for devices which are not currently
+defined in this standard.
+
+Discovering what devices are available and their type is bus-dependent.
+
++------------+--------------------+
+| Device ID  |   Virtio Device    |
++------------+--------------------+
++------------+--------------------+
+| 1          |   network card     |
++------------+--------------------+
+| 2          |   block device     |
++------------+--------------------+
+| 3          |      console       |
++------------+--------------------+
+| 4          |  entropy source    |
++------------+--------------------+
+| 5          | memory ballooning  |
++------------+--------------------+
+| 6          |     ioMemory       |
++------------+--------------------+
+| 7          |       rpmsg        |
++------------+--------------------+
+| 8          |     SCSI host      |
++------------+--------------------+
+| 9          |   9P transport     |
++------------+--------------------+
+| 10         |   mac80211 wlan    |
++------------+--------------------+
+| 11         |   rproc serial     |
++------------+--------------------+
+| 12         |   virtio CAIF      |
++------------+--------------------+
+
+2.5.1 Network Device
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+The virtio network device is a virtual ethernet card, and is the=20
+most complex of the devices supported so far by virtio. It has=20
+enhanced rapidly and demonstrates clearly how support for new=20
+features should be added to an existing device. Empty buffers are=20
+placed in one virtqueue for receiving packets, and outgoing=20
+packets are enqueued into another for transmission in that order.=20
+A third command queue is used to control advanced filtering=20
+features.
+
+2.5.1.1 Device ID
+-----------------
+
+ 1
+
+2.5.1.2 Virtqueues
+------------------
+
+ 0:receiveq. 1:transmitq. 2:controlq
+
+ Virtqueue 2 only exists if VIRTIO_NET_F_CTRL_VQ set.
+
+2.5.1.3 Feature bits=20
+--------------------
+
+  VIRTIO_NET_F_CSUM (0) Device handles packets with partial checksum
+
+  VIRTIO_NET_F_GUEST_CSUM (1) Guest handles packets with partial checksum
+
+  VIRTIO_NET_F_CTRL_GUEST_OFFLOADS (2) Control channel offloads
+=09reconfiguration support.
+
+  VIRTIO_NET_F_MAC (5) Device has given MAC address.
+
+  VIRTIO_NET_F_GSO (6) (Deprecated) device handles packets with=20
+    any GSO type.[13]=20
+
+  VIRTIO_NET_F_GUEST_TSO4 (7) Guest can receive TSOv4.
+
+  VIRTIO_NET_F_GUEST_TSO6 (8) Guest can receive TSOv6.
+
+  VIRTIO_NET_F_GUEST_ECN (9) Guest can receive TSO with ECN.
+
+  VIRTIO_NET_F_GUEST_UFO (10) Guest can receive UFO.
+
+  VIRTIO_NET_F_HOST_TSO4 (11) Device can receive TSOv4.
+
+  VIRTIO_NET_F_HOST_TSO6 (12) Device can receive TSOv6.
+
+  VIRTIO_NET_F_HOST_ECN (13) Device can receive TSO with ECN.
+
+  VIRTIO_NET_F_HOST_UFO (14) Device can receive UFO.
+
+  VIRTIO_NET_F_MRG_RXBUF (15) Guest can merge receive buffers.
+
+  VIRTIO_NET_F_STATUS (16) Configuration status field is=20
+    available.
+
+  VIRTIO_NET_F_CTRL_VQ (17) Control channel is available.
+
+  VIRTIO_NET_F_CTRL_RX (18) Control channel RX mode support.
+
+  VIRTIO_NET_F_CTRL_VLAN (19) Control channel VLAN filtering.
+
+  VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send gratuitous=20
+    packets.
+
+  Device configuration layout Two configuration fields are=20
+  currently defined. The mac address field always exists (though=20
+  is only valid if VIRTIO_NET_F_MAC is set), and the status field=20
+  only exists if VIRTIO_NET_F_STATUS is set. Two read-only bits=20
+  are currently defined for the status field:=20
+  VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE.
+
+=09#define VIRTIO_NET_S_LINK_UP=091
+=09#define VIRTIO_NET_S_ANNOUNCE=092
+
+=09struct virtio_net_config {
+=09=09u8 mac[6];
+=09=09u16 status;
+=09};
+
+2.5.1.4 Device Initialization
+-----------------------------
+
+1. The initialization routine should identify the receive and=20
+  transmission virtqueues.
+
+2. If the VIRTIO_NET_F_MAC feature bit is set, the configuration=20
+  space "mac" entry indicates the "physical" address of the the=20
+  network card, otherwise a private MAC address should be=20
+  assigned. All guests are expected to negotiate this feature if=20
+  it is set.
+
+3. If the VIRTIO_NET_F_CTRL_VQ feature bit is negotiated,=20
+  identify the control virtqueue.
+
+4. If the VIRTIO_NET_F_STATUS feature bit is negotiated, the link=20
+  status can be read from the bottom bit of the "status" config=20
+  field. Otherwise, the link should be assumed active.
+
+5. The receive virtqueue should be filled with receive buffers.=20
+  This is described in detail below in "Setting Up Receive=20
+  Buffers".
+
+6. A driver can indicate that it will generate checksumless=20
+  packets by negotating the VIRTIO_NET_F_CSUM feature. This "
+  checksum offload" is a common feature on modern network cards.
+
+7. If that feature is negotiated[14], a driver can use TCP or UDP
+  segmentation offload by negotiating the VIRTIO_NET_F_HOST_TSO4 (IPv4
+  TCP), VIRTIO_NET_F_HOST_TSO6 (IPv6 TCP) and VIRTIO_NET_F_HOST_UFO
+  (UDP fragmentation) features. It should not send TCP packets
+  requiring segmentation offload which have the Explicit Congestion
+  Notification bit set, unless the VIRTIO_NET_F_HOST_ECN feature is
+  negotiated.[15]
+
+8. The converse features are also available: a driver can save=20
+  the virtual device some work by negotiating these features.[16]
+   The VIRTIO_NET_F_GUEST_CSUM feature indicates that partially=20
+  checksummed packets can be received, and if it can do that then=20
+  the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6,=20
+  VIRTIO_NET_F_GUEST_UFO and VIRTIO_NET_F_GUEST_ECN are the input=20
+  equivalents of the features described above. See "Receiving=20
+  Packets" below.
+
+2.5.1.5 Device Operation
+------------------------
+
+Packets are transmitted by placing them in the transmitq, and=20
+buffers for incoming packets are placed in the receiveq. In each=20
+case, the packet itself is preceeded by a header:
+
+=09struct virtio_net_hdr {
+=09#define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
+=09=09u8 flags;
+=09#define VIRTIO_NET_HDR_GSO_NONE        0
+=09#define VIRTIO_NET_HDR_GSO_TCPV4       1
+=09#define VIRTIO_NET_HDR_GSO_UDP=09=09 3
+=09#define VIRTIO_NET_HDR_GSO_TCPV6       4
+=09#define VIRTIO_NET_HDR_GSO_ECN      0x80
+=09=09u8 gso_type;
+=09=09u16 hdr_len;
+=09=09u16 gso_size;
+=09=09u16 csum_start;
+=09=09u16 csum_offset;
+=09/* Only if VIRTIO_NET_F_MRG_RXBUF: */
+=09=09u16 num_buffers;
+=09};
+
+The controlq is used to control device features such as=20
+filtering.
+
+2.5.1.5.1 Packet Transmission
+-----------------------------
+
+Transmitting a single packet is simple, but varies depending on=20
+the different features the driver negotiated.
+
+1. If the driver negotiated VIRTIO_NET_F_CSUM, and the packet has=20
+  not been fully checksummed, then the virtio_net_hdr's fields=20
+  are set as follows. Otherwise, the packet must be fully=20
+  checksummed, and flags is zero.
+
+  =E2=80=A2 flags has the VIRTIO_NET_HDR_F_NEEDS_CSUM set,
+
+  =E2=80=A2 csum_start is set to the offset within the packet to begin che=
cksumming,
+    and
+
+  =E2=80=A2 csum_offset indicates how many bytes after the csum_start the=
=20
+    new (16 bit ones' complement) checksum should be placed.[17]
+
+2. If the driver negotiated=20
+  VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO, and the packet requires=20
+  TCP segmentation or UDP fragmentation, then the "gso_type"=20
+  field is set to VIRTIO_NET_HDR_GSO_TCPV4, TCPV6 or UDP.=20
+  (Otherwise, it is set to VIRTIO_NET_HDR_GSO_NONE). In this=20
+  case, packets larger than 1514 bytes can be transmitted: the=20
+  metadata indicates how to replicate the packet header to cut it=20
+  into smaller packets. The other gso fields are set:
+
+  =E2=80=A2 hdr_len is a hint to the device as to how much of the header=
=20
+    needs to be kept to copy into each packet, usually set to the=20
+    length of the headers, including the transport header.[18]
+
+  =E2=80=A2 gso_size is the maximum size of each packet beyond that=20
+    header (ie. MSS).
+
+  =E2=80=A2 If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature,=20
+    the VIRTIO_NET_HDR_GSO_ECN bit may be set in "gso_type" as=20
+    well, indicating that the TCP packet has the ECN bit set.[19]
+
+3. If the driver negotiated the VIRTIO_NET_F_MRG_RXBUF feature,=20
+  the num_buffers field is set to zero.
+
+4. The header and packet are added as one output buffer to the
+  transmitq, and the device is notified of the new entry (see 2.4.1.4
+  Notifying The Device).[20]
+
+2.5.1.5.1.1 Packet Transmission Interrupt
+-----------------------------------------
+
+Often a driver will suppress transmission interrupts using the
+VRING_AVAIL_F_NO_INTERRUPT flag (see 2.4.2 Receiving Used Buffers From
+The Device) and check for used packets in the transmit path of following=
=20
+packets. However, it will still receive interrupts if the=20
+VIRTIO_F_NOTIFY_ON_EMPTY feature is negotiated, indicating that=20
+the transmission queue is completely emptied.
+
+The normal behavior in this interrupt handler is to retrieve and=20
+new descriptors from the used ring and free the corresponding=20
+headers and packets.
+
+2.5.1.5.2 Setting Up Receive Buffers
+
+It is generally a good idea to keep the receive virtqueue as=20
+fully populated as possible: if it runs out, network performance=20
+will suffer.
+
+If the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6 or=20
+VIRTIO_NET_F_GUEST_UFO features are used, the Guest will need to=20
+accept packets of up to 65550 bytes long (the maximum size of a=20
+TCP or UDP packet, plus the 14 byte ethernet header), otherwise=20
+1514 bytes. So unless VIRTIO_NET_F_MRG_RXBUF is negotiated, every=20
+buffer in the receive queue needs to be at least this length [20a]
+
+If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer must be at=20
+least the size of the struct virtio_net_hdr.
+
+2.5.1.5.2.1 Packet Receive Interrupt
+------------------------------------
+
+When a packet is copied into a buffer in the receiveq, the=20
+optimal path is to disable further interrupts for the receiveq=20
+(see [sub:Receiving-Used-Buffers]) and process packets until no=20
+more are found, then re-enable them.
+
+Processing packet involves:
+
+1. If the driver negotiated the VIRTIO_NET_F_MRG_RXBUF feature,=20
+  then the "num_buffers" field indicates how many descriptors=20
+  this packet is spread over (including this one). This allows=20
+  receipt of large packets without having to allocate large=20
+  buffers. In this case, there will be at least "num_buffers" in=20
+  the used ring, and they should be chained together to form a=20
+  single packet. The other buffers will not begin with a struct=20
+  virtio_net_hdr.
+
+2. If the VIRTIO_NET_F_MRG_RXBUF feature was not negotiated, or=20
+  the "num_buffers" field is one, then the entire packet will be=20
+  contained within this buffer, immediately following the struct=20
+  virtio_net_hdr.
+
+3. If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the=20
+  VIRTIO_NET_HDR_F_NEEDS_CSUM bit in the "flags" field may be=20
+  set: if so, the checksum on the packet is incomplete and the "
+  csum_start" and "csum_offset" fields indicate how to calculate=20
+  it (see Packet Transmission point 1).
+
+4. If the VIRTIO_NET_F_GUEST_TSO4, TSO6 or UFO options were=20
+  negotiated, then the "gso_type" may be something other than=20
+  VIRTIO_NET_HDR_GSO_NONE, and the "gso_size" field indicates the=20
+  desired MSS (see Packet Transmission point 2).
+
+2.5.1.5.3 Control Virtqueue
+---------------------------
+
+The driver uses the control virtqueue (if VIRTIO_NET_F_VTRL_VQ is=20
+negotiated) to send commands to manipulate various features of=20
+the device which would not easily map into the configuration=20
+space.
+
+All commands are of the following form:
+
+=09struct virtio_net_ctrl {
+=09=09u8 class;
+=09=09u8 command;
+=09=09u8 command-specific-data[];
+=09=09u8 ack;
+=09};
+
+=09/* ack values */
+=09#define VIRTIO_NET_OK     0
+=09#define VIRTIO_NET_ERR    1=20
+
+The class, command and command-specific-data are set by the=20
+driver, and the device sets the ack byte. There is little it can=20
+do except issue a diagnostic if the ack byte is not=20
+VIRTIO_NET_OK.
+
+2.5.1.5.3.1 Packet Receive Filtering
+------------------------------------
+
+If the VIRTIO_NET_F_CTRL_RX feature is negotiated, the driver can=20
+send control commands for promiscuous mode, multicast receiving,=20
+and filtering of MAC addresses.
+
+Note that in general, these commands are best-effort: unwanted=20
+packets may still arrive.=20
+
+Setting Promiscuous Mode
+
+=09#define VIRTIO_NET_CTRL_RX    0
+=09 #define VIRTIO_NET_CTRL_RX_PROMISC      0
+=09 #define VIRTIO_NET_CTRL_RX_ALLMULTI     1=20
+
+The class VIRTIO_NET_CTRL_RX has two commands:=20
+VIRTIO_NET_CTRL_RX_PROMISC turns promiscuous mode on and off, and=20
+VIRTIO_NET_CTRL_RX_ALLMULTI turns all-multicast receive on and=20
+off. The command-specific-data is one byte containing 0 (off) or=20
+1 (on).
+
+2.5.1.5.3.2 Setting MAC Address Filtering
+-----------------------------------------
+
+=09struct virtio_net_ctrl_mac {
+=09=09u32 entries;
+=09=09u8 macs[entries][ETH_ALEN];
+=09};
+
+=09#define VIRTIO_NET_CTRL_MAC    1
+=09 #define VIRTIO_NET_CTRL_MAC_TABLE_SET        0=20
+
+The device can filter incoming packets by any number of destination
+MAC addresses.[21] This table is set using the class
+VIRTIO_NET_CTRL_MAC and the command VIRTIO_NET_CTRL_MAC_TABLE_SET. The
+command-specific-data is two variable length tables of 6-byte MAC
+addresses. The first table contains unicast addresses, and the second
+contains multicast addresses.
+
+2.5.1.5.3.3 VLAN Filtering
+--------------------------
+
+If the driver negotiates the VIRTION_NET_F_CTRL_VLAN feature, it=20
+can control a VLAN filter table in the device.
+
+=09#define VIRTIO_NET_CTRL_VLAN       2
+=09 #define VIRTIO_NET_CTRL_VLAN_ADD             0
+=09 #define VIRTIO_NET_CTRL_VLAN_DEL             1=20
+
+Both the VIRTIO_NET_CTRL_VLAN_ADD and VIRTIO_NET_CTRL_VLAN_DEL=20
+command take a 16-bit VLAN id as the command-specific-data.
+
+2.5.1.5.3.4 Gratuitous Packet Sending
+-------------------------------------
+
+If the driver negotiates the VIRTIO_NET_F_GUEST_ANNOUNCE (depends=20
+on VIRTIO_NET_F_CTRL_VQ), it can ask the guest to send gratuitous=20
+packets; this is usually done after the guest has been physically=20
+migrated, and needs to announce its presence on the new network=20
+links. (As hypervisor does not have the knowledge of guest=20
+network configuration (eg. tagged vlan) it is simplest to prod=20
+the guest in this way).
+
+=09#define VIRTIO_NET_CTRL_ANNOUNCE       3
+=09 #define VIRTIO_NET_CTRL_ANNOUNCE_ACK             0
+
+The Guest needs to check VIRTIO_NET_S_ANNOUNCE bit in status=20
+field when it notices the changes of device configuration. The=20
+command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate that=20
+driver has recevied the notification and device would clear the=20
+VIRTIO_NET_S_ANNOUNCE bit in the status filed after it received=20
+this command.
+
+Processing this notification involves:
+
+1. Sending the gratuitous packets or marking there are pending=20
+  gratuitous packets to be sent and letting deferred routine to=20
+  send them.
+
+2. Sending VIRTIO_NET_CTRL_ANNOUNCE_ACK command through control=20
+  vq.=20
+
+2.5.1.5.3.4 Offloads State Configuration
+
+If the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS feature is negotiated, the driver =
can
+send control commands for dynamic offloads state configuration.
+
+2.5.1.5.4.3.1 Setting Offloads State
+
+=09u64 offloads;
+
+=09#define VIRTIO_NET_F_GUEST_CSUM       1
+=09#define VIRTIO_NET_F_GUEST_TSO4       7
+=09#define VIRTIO_NET_F_GUEST_TSO6       8
+=09#define VIRTIO_NET_F_GUEST_ECN        9
+=09#define VIRTIO_NET_F_GUEST_UFO        10
+
+=09#define VIRTIO_NET_CTRL_GUEST_OFFLOADS       5
+=09 #define VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET   0
+
+The class VIRTIO_NET_CTRL_GUEST_OFFLOADS has one command:
+VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET applies the new offloads configuration.
+
+u64 value passed as command data is a bitmask, bits set define
+offloads to be enabled, bits cleared - offloads to be disabled.
+
+There is a corresponding device feature for each offload. Upon feature
+negotiation corresponding offload gets enabled to preserve backward
+compartibility.
+
+Corresponding feature must be negotiated at startup in order to allow dyna=
mic
+change of specific offload state.
+
+
+2.5.2 Block Device
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+The virtio block device is a simple virtual block device (ie.=20
+disk). Read and write requests (and other exotic requests) are=20
+placed in the queue, and serviced (probably out of order) by the=20
+device except where noted.
+
+2.5.2.1 Device ID
+-----------------
+  2
+
+2.5.2.2 Virtqueues
+------------------
+  0:requestq
+
+2.5.2.3 Feature bits
+--------------------
+
+  VIRTIO_BLK_F_BARRIER (0) Host supports request barriers.
+
+  VIRTIO_BLK_F_SIZE_MAX (1) Maximum size of any single segment is=20
+    in "size_max".
+
+  VIRTIO_BLK_F_SEG_MAX (2) Maximum number of segments in a=20
+    request is in "seg_max".
+
+  VIRTIO_BLK_F_GEOMETRY (4) Disk-style geometry specified in "
+    geometry".
+
+  VIRTIO_BLK_F_RO (5) Device is read-only.
+
+  VIRTIO_BLK_F_BLK_SIZE (6) Block size of disk is in "blk_size".
+
+  VIRTIO_BLK_F_SCSI (7) Device supports scsi packet commands.
+
+  VIRTIO_BLK_F_FLUSH (9) Cache flush command support.
+
+  Device configuration layout The capacity of the device=20
+  (expressed in 512-byte sectors) is always present. The=20
+  availability of the others all depend on various feature bits=20
+  as indicated above.
+
+=09struct virtio_blk_config {
+=09=09u64 capacity;
+=09=09u32 size_max;
+=09=09u32 seg_max;
+=09=09struct virtio_blk_geometry {
+=09=09=09u16 cylinders;
+=09=09=09u8 heads;
+=09=09=09u8 sectors;
+=09=09} geometry;
+=09=09u32 blk_size;
+=09};
+
+2.5.2.4 Device Initialization
+-----------------------------
+
+1. The device size should be read from the "capacity"=20
+  configuration field. No requests should be submitted which goes=20
+  beyond this limit.
+
+2. If the VIRTIO_BLK_F_BLK_SIZE feature is negotiated, the=20
+  blk_size field can be read to determine the optimal sector size=20
+  for the driver to use. This does not effect the units used in=20
+  the protocol (always 512 bytes), but awareness of the correct=20
+  value can effect performance.
+
+3. If the VIRTIO_BLK_F_RO feature is set by the device, any write=20
+  requests will fail.
+
+2.5.2.5 Device Operation
+------------------------
+
+The driver queues requests to the virtqueue, and they are used by=20
+the device (not necessarily in order). Each request is of form:
+
+=09struct virtio_blk_req {
+=09=09u32 type;
+=09=09u32 ioprio;
+=09=09u64 sector;
+=09=09char data[][512];
+=09=09u8 status;
+=09};
+
+If the device has VIRTIO_BLK_F_SCSI feature, it can also support=20
+scsi packet command requests, each of these requests is of form:
+
+=09struct virtio_scsi_pc_req {
+=09=09u32 type;
+=09=09u32 ioprio;
+=09=09u64 sector;
+=09=09char cmd[];
+=09=09char data[][512];
+#define SCSI_SENSE_BUFFERSIZE   96
+=09=09u8 sense[SCSI_SENSE_BUFFERSIZE];
+=09=09u32 errors;
+=09=09u32 data_len;
+=09=09u32 sense_len;
+=09=09u32 residual;
+=09=09u8 status;
+=09};
+
+The type of the request is either a read (VIRTIO_BLK_T_IN), a write
+(VIRTIO_BLK_T_OUT), a scsi packet command (VIRTIO_BLK_T_SCSI_CMD or
+VIRTIO_BLK_T_SCSI_CMD_OUT[22]) or a flush (VIRTIO_BLK_T_FLUSH or
+VIRTIO_BLK_T_FLUSH_OUT[23]). If the device has VIRTIO_BLK_F_BARRIER
+feature the high bit (VIRTIO_BLK_T_BARRIER) indicates that this
+request acts as a barrier and that all preceeding requests must be
+complete before this one, and all following requests must not be
+started until this is complete. Note that a barrier does not flush
+caches in the underlying backend device in host, and thus does not
+serve as data consistency guarantee. Driver must use FLUSH request to
+flush the host cache.
+
+=09#define VIRTIO_BLK_T_IN           0
+=09#define VIRTIO_BLK_T_OUT          1
+=09#define VIRTIO_BLK_T_SCSI_CMD     2
+=09#define VIRTIO_BLK_T_SCSI_CMD_OUT 3
+=09#define VIRTIO_BLK_T_FLUSH        4
+=09#define VIRTIO_BLK_T_FLUSH_OUT    5
+=09#define VIRTIO_BLK_T_BARRIER=09 0x80000000
+
+The ioprio field is a hint about the relative priorities of=20
+requests to the device: higher numbers indicate more important=20
+requests.
+
+The sector number indicates the offset (multiplied by 512) where=20
+the read or write is to occur. This field is unused and set to 0=20
+for scsi packet commands and for flush commands.
+
+The cmd field is only present for scsi packet command requests,=20
+and indicates the command to perform. This field must reside in a=20
+single, separate read-only buffer; command length can be derived=20
+from the length of this buffer.=20
+
+Note that these first three (four for scsi packet commands)=20
+fields are always read-only: the data field is either read-only=20
+or write-only, depending on the request. The size of the read or=20
+write can be derived from the total size of the request buffers.
+
+The sense field is only present for scsi packet command requests,=20
+and indicates the buffer for scsi sense data.
+
+The data_len field is only present for scsi packet command=20
+requests, this field is deprecated, and should be ignored by the=20
+driver. Historically, devices copied data length there.
+
+The sense_len field is only present for scsi packet command=20
+requests and indicates the number of bytes actually written to=20
+the sense buffer.
+
+The residual field is only present for scsi packet command=20
+requests and indicates the residual size, calculated as data=20
+length - number of bytes actually transferred.
+
+The final status byte is written by the device: either=20
+VIRTIO_BLK_S_OK for success, VIRTIO_BLK_S_IOERR for host or guest=20
+error or VIRTIO_BLK_S_UNSUPP for a request unsupported by host:
+
+=09#define VIRTIO_BLK_S_OK        0
+=09#define VIRTIO_BLK_S_IOERR     1
+=09#define VIRTIO_BLK_S_UNSUPP    2
+
+Historically, devices assumed that the fields type, ioprio and=20
+sector reside in a single, separate read-only buffer; the fields=20
+errors, data_len, sense_len and residual reside in a single,=20
+separate write-only buffer; the sense field in a separate=20
+write-only buffer of size 96 bytes, by itself; the fields errors,=20
+data_len, sense_len and residual in a single write-only buffer;=20
+and the status field is a separate read-only buffer of size 1=20
+byte, by itself.
+
+
+2.5.3 Console Device
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+The virtio console device is a simple device for data input and=20
+output. A device may have one or more ports. Each port has a pair=20
+of input and output virtqueues. Moreover, a device has a pair of=20
+control IO virtqueues. The control virtqueues are used to=20
+communicate information between the device and the driver about=20
+ports being opened and closed on either side of the connection,=20
+indication from the host about whether a particular port is a=20
+console port, adding new ports, port hot-plug/unplug, etc., and=20
+indication from the guest about whether a port or a device was=20
+successfully added, port open/close, etc.. For data IO, one or=20
+more empty buffers are placed in the receive queue for incoming=20
+data and outgoing characters are placed in the transmit queue.
+
+2.5.3.1 Device ID
+-----------------
+
+  3
+
+2.5.3.2 Virtqueues
+------------------
+
+   0:receiveq(port0). 1:transmitq(port0), 2:control receiveq, 3:control tr=
ansmitq, 4:receiveq(port1), 5:transmitq(port1),=20
+  ...
+
+  Ports 2 onwards only exist if VIRTIO_CONSOLE_F_MULTIPORT is set.
+
+2.5.3.3 Feature bits
+--------------------
+
+  VIRTIO_CONSOLE_F_SIZE (0) Configuration cols and rows fields=20
+    are valid.
+
+  VIRTIO_CONSOLE_F_MULTIPORT(1) Device has support for multiple=20
+    ports; configuration fields nr_ports and max_nr_ports are=20
+    valid and control virtqueues will be used.
+
+2.5.3.4 Device configuration layout
+-----------------------------------
+
+  The size of the console is supplied=20
+  in the configuration space if the VIRTIO_CONSOLE_F_SIZE feature=20
+  is set. Furthermore, if the VIRTIO_CONSOLE_F_MULTIPORT feature=20
+  is set, the maximum number of ports supported by the device can=20
+  be fetched.
+
+=09struct virtio_console_config {
+=09=09u16 cols;
+=09=09u16 rows;
+=09=09u32 max_nr_ports;
+=09};
+
+2.5.3.5 Device Initialization
+-----------------------------
+
+1. If the VIRTIO_CONSOLE_F_SIZE feature is negotiated, the driver=20
+  can read the console dimensions from the configuration fields.
+
+2. If the VIRTIO_CONSOLE_F_MULTIPORT feature is negotiated, the=20
+  driver can spawn multiple ports, not all of which may be=20
+  attached to a console. Some could be generic ports. In this=20
+  case, the control virtqueues are enabled and according to the=20
+  max_nr_ports configuration-space value, the appropriate number=20
+  of virtqueues are created. A control message indicating the=20
+  driver is ready is sent to the host. The host can then send=20
+  control messages for adding new ports to the device. After=20
+  creating and initializing each port, a=20
+  VIRTIO_CONSOLE_PORT_READY control message is sent to the host=20
+  for that port so the host can let us know of any additional=20
+  configuration options set for that port.
+
+3. The receiveq for each port is populated with one or more=20
+  receive buffers.
+
+2.5.3.6 Device Operation
+------------------------
+
+1. For output, a buffer containing the characters is placed in=20
+  the port's transmitq.[25]
+
+2. When a buffer is used in the receiveq (signalled by an=20
+  interrupt), the contents is the input to the port associated=20
+  with the virtqueue for which the notification was received.
+
+3. If the driver negotiated the VIRTIO_CONSOLE_F_SIZE feature, a=20
+  configuration change interrupt may occur. The updated size can=20
+  be read from the configuration fields.
+
+4. If the driver negotiated the VIRTIO_CONSOLE_F_MULTIPORT=20
+  feature, active ports are announced by the host using the=20
+  VIRTIO_CONSOLE_PORT_ADD control message. The same message is=20
+  used for port hot-plug as well.
+
+5. If the host specified a port `name', a sysfs attribute is=20
+  created with the name filled in, so that udev rules can be=20
+  written that can create a symlink from the port's name to the=20
+  char device for port discovery by applications in the guest.
+
+6. Changes to ports' state are effected by control messages.=20
+  Appropriate action is taken on the port indicated in the=20
+  control message. The layout of the structure of the control=20
+  buffer and the events associated are:
+
+=09struct virtio_console_control {
+=09=09uint32_t id;    /* Port number */
+=09=09uint16_t event; /* The kind of control event */
+=09=09uint16_t value; /* Extra information for the event */
+=09};
+
+=09/* Some events for the internal messages (control packets) */
+=09#define VIRTIO_CONSOLE_DEVICE_READY     0
+=09#define VIRTIO_CONSOLE_PORT_ADD         1
+=09#define VIRTIO_CONSOLE_PORT_REMOVE      2
+=09#define VIRTIO_CONSOLE_PORT_READY       3
+=09#define VIRTIO_CONSOLE_CONSOLE_PORT     4
+=09#define VIRTIO_CONSOLE_RESIZE           5
+=09#define VIRTIO_CONSOLE_PORT_OPEN        6
+=09#define VIRTIO_CONSOLE_PORT_NAME        7
+
+2.5.4 Entropy Device
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+The virtio entropy device supplies high-quality randomness for=20
+guest use.
+
+2.5.4.1 Device ID
+-----------------
+  4
+
+2.5.4.2 Virtqueues
+------------------
+  0:requestq.
+
+2.5.4.3 Feature bits
+--------------------
+  None currently defined
+
+2.5.4.4 Device configuration layout
+-----------------------------------
+  None currently defined.
+
+2.5.4.5 Device Initialization
+-----------------------------
+
+1. The virtqueue is initialized
+
+2.5.4.6 Device Operation
+------------------------
+
+When the driver requires random bytes, it places the descriptor=20
+of one or more buffers in the queue. It will be completely filled=20
+by random data by the device.
+
+2.5.5 Memory Balloon Device
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D
+
+The virtio memory balloon device is a primitive device for=20
+managing guest memory: the device asks for a certain amount of=20
+memory, and the guest supplies it (or withdraws it, if the device=20
+has more than it asks for). This allows the guest to adapt to=20
+changes in allowance of underlying physical memory. If the=20
+feature is negotiated, the device can also be used to communicate=20
+guest memory statistics to the host.
+
+2.5.5.1 Device ID
+-----------------
+  5
+
+2.5.5.2 Virtqueues
+------------------
+  0:inflateq. 1:deflateq. 2:statsq.
+
+  Virtqueue 2 only exists if VIRTIO_BALLON_F_STATS_VQ set.
+
+2.5.5.3 Feature bits
+--------------------
+  VIRTIO_BALLOON_F_MUST_TELL_HOST (0) Host must be told before=20
+    pages from the balloon are used.
+
+  VIRTIO_BALLOON_F_STATS_VQ (1) A virtqueue for reporting guest=20
+    memory statistics is present.
+
+2.5.5.4 Device configuration layout
+-----------------------------------
+  Both fields of this configuration=20
+  are always available. Note that they are little endian, despite=20
+  convention that device fields are guest endian:
+
+=09struct virtio_balloon_config {
+=09=09u32 num_pages;
+=09=09u32 actual;
+=09};
+
+2.5.5.5 Device Initialization
+-----------------------------
+
+1. The inflate and deflate virtqueues are identified.
+
+2. If the VIRTIO_BALLOON_F_STATS_VQ feature bit is negotiated:
+
+  (a) Identify the stats virtqueue.
+
+  (b) Add one empty buffer to the stats virtqueue and notify the=20
+    host.
+
+Device operation begins immediately.
+
+2.5.5.6 Device Operation
+------------------------
+
+Memory Ballooning The device is driven by the receipt of a=20
+configuration change interrupt.
+
+1. The "num_pages" configuration field is examined. If this is=20
+  greater than the "actual" number of pages, memory must be given=20
+  to the balloon. If it is less than the "actual" number of=20
+  pages, memory may be taken back from the balloon for general=20
+  use.
+
+2. To supply memory to the balloon (aka. inflate):
+
+  (a) The driver constructs an array of addresses of unused memory
+    pages. These addresses are divided by 4096[27] and the descriptor
+    describing the resulting 32-bit array is added to the inflateq.
+
+3. To remove memory from the balloon (aka. deflate):
+
+  (a) The driver constructs an array of addresses of memory pages=20
+    it has previously given to the balloon, as described above.=20
+    This descriptor is added to the deflateq.
+
+  (b) If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is negotiated, the=20
+    guest may not use these requested pages until that descriptor=20
+    in the deflateq has been used by the device.
+
+  (c) Otherwise, the guest may begin to re-use pages previously=20
+    given to the balloon before the device has acknowledged their=20
+    withdrawl. [28]=20
+
+4. In either case, once the device has completed the inflation or=20
+  deflation, the "actual" field of the configuration should be=20
+  updated to reflect the new number of pages in the balloon.[29]
+
+2.5.5.6.1 Memory Statistics
+---------------------------
+
+The stats virtqueue is atypical because communication is driven=20
+by the device (not the driver). The channel becomes active at=20
+driver initialization time when the driver adds an empty buffer=20
+and notifies the device. A request for memory statistics proceeds=20
+as follows:
+
+1. The device pushes the buffer onto the used ring and sends an=20
+  interrupt.
+
+2. The driver pops the used buffer and discards it.
+
+3. The driver collects memory statistics and writes them into a=20
+  new buffer.
+
+4. The driver adds the buffer to the virtqueue and notifies the=20
+  device.
+
+5. The device pops the buffer (retaining it to initiate a=20
+  subsequent request) and consumes the statistics.
+
+  Memory Statistics Format Each statistic consists of a 16 bit=20
+  tag and a 64 bit value. Both quantities are represented in the=20
+  native endian of the guest. All statistics are optional and the=20
+  driver may choose which ones to supply. To guarantee backwards=20
+  compatibility, unsupported statistics should be omitted.
+
+=09struct virtio_balloon_stat {
+=09#define VIRTIO_BALLOON_S_SWAP_IN  0
+=09#define VIRTIO_BALLOON_S_SWAP_OUT 1
+=09#define VIRTIO_BALLOON_S_MAJFLT   2
+=09#define VIRTIO_BALLOON_S_MINFLT   3
+=09#define VIRTIO_BALLOON_S_MEMFREE  4
+=09#define VIRTIO_BALLOON_S_MEMTOT   5
+=09=09u16 tag;
+=09=09u64 val;
+=09} __attribute__((packed));
+
+2.5.5.6.2 Memory Statistics Tags
+--------------------------------
+
+  VIRTIO_BALLOON_S_SWAP_IN The amount of memory that has been=20
+  swapped in (in bytes).
+
+  VIRTIO_BALLOON_S_SWAP_OUT The amount of memory that has been=20
+  swapped out to disk (in bytes).
+
+  VIRTIO_BALLOON_S_MAJFLT The number of major page faults that=20
+  have occurred.
+
+  VIRTIO_BALLOON_S_MINFLT The number of minor page faults that=20
+  have occurred.
+
+  VIRTIO_BALLOON_S_MEMFREE The amount of memory not being used=20
+  for any purpose (in bytes).
+
+  VIRTIO_BALLOON_S_MEMTOT The total amount of memory available=20
+  (in bytes).
+
+
+2.5.6 SCSI Host Device
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+The virtio SCSI host device groups together one or more virtual=20
+logical units (such as disks), and allows communicating to them=20
+using the SCSI protocol. An instance of the device represents a=20
+SCSI host to which many targets and LUNs are attached.
+
+The virtio SCSI device services two kinds of requests:
+
+=E2=80=A2 command requests for a logical unit;
+
+=E2=80=A2 task management functions related to a logical unit, target or=
=20
+  command.
+
+The device is also able to send out notifications about added and=20
+removed logical units. Together, these capabilities provide a=20
+SCSI transport protocol that uses virtqueues as the transfer=20
+medium. In the transport protocol, the virtio driver acts as the=20
+initiator, while the virtio SCSI host provides one or more=20
+targets that receive and process the requests.=20
+
+2.5.6.1 Device ID
+-----------------
+  8
+
+2.5.6.2 Virtqueues
+------------------
+   0:controlq; 1:eventq; 2..n:request queues.
+
+2.5.6.3 Feature bits
+--------------------
+
+  VIRTIO_SCSI_F_INOUT (0) A single request can include both=20
+    read-only and write-only data buffers.
+
+  VIRTIO_SCSI_F_HOTPLUG (1) The host should enable=20
+    hot-plug/hot-unplug of new LUNs and targets on the SCSI bus.
+
+2.5.6.4 Device configuration layout
+-----------------------------------
+
+  All fields of this configuration are always available. sense_size
+  and cdb_size are writable by the guest.
+
+=09struct virtio_scsi_config {
+=09=09u32 num_queues;
+=09=09u32 seg_max;
+=09=09u32 max_sectors;
+=09=09u32 cmd_per_lun;
+=09=09u32 event_info_size;
+=09=09u32 sense_size;
+=09=09u32 cdb_size;
+=09=09u16 max_channel;
+=09=09u16 max_target;
+=09=09u32 max_lun;
+=09};
+
+  num_queues is the total number of request virtqueues exposed by=20
+    the device. The driver is free to use only one request queue,=20
+    or it can use more to achieve better performance.
+
+  seg_max is the maximum number of segments that can be in a=20
+    command. A bidirectional command can include seg_max input=20
+    segments and seg_max output segments.
+
+  max_sectors is a hint to the guest about the maximum transfer=20
+    size it should use.
+
+  cmd_per_lun is a hint to the guest about the maximum number of=20
+    linked commands it should send to one LUN. The actual value=20
+    to be used is the minimum of cmd_per_lun and the virtqueue=20
+    size.
+
+  event_info_size is the maximum size that the device will fill=20
+    for buffers that the driver places in the eventq. The driver=20
+    should always put buffers at least of this size. It is=20
+    written by the device depending on the set of negotated=20
+    features.
+
+  sense_size is the maximum size of the sense data that the=20
+    device will write. The default value is written by the device=20
+    and will always be 96, but the driver can modify it. It is=20
+    restored to the default when the device is reset.
+
+  cdb_size is the maximum size of the CDB that the driver will=20
+    write. The default value is written by the device and will=20
+    always be 32, but the driver can likewise modify it. It is=20
+    restored to the default when the device is reset.
+
+  max_channel, max_target and max_lun can be used by the driver=20
+    as hints to constrain scanning the logical units on the=20
+    host.h
+
+2.5.6.5 Device Initialization
+-----------------------------
+
+The initialization routine should first of all discover the=20
+device's virtqueues.
+
+If the driver uses the eventq, it should then place at least a=20
+buffer in the eventq.
+
+The driver can immediately issue requests (for example, INQUIRY=20
+or REPORT LUNS) or task management functions (for example, I_T=20
+RESET).=20
+
+2.5.6.6 Device Operation
+------------------------
+
+Device operation consists of operating request queues, the control
+queue and the event queue.
+
+2.5.6.6.1 Device Operation: Request Queues
+------------------------------------------
+
+The driver queues requests to an arbitrary request queue, and=20
+they are used by the device on that same queue. It is the=20
+responsibility of the driver to ensure strict request ordering=20
+for commands placed on different queues, because they will be=20
+consumed with no order constraints.
+
+Requests have the following format:=20
+
+=09struct virtio_scsi_req_cmd {
+=09=09// Read-only
+=09=09u8 lun[8];
+=09=09u64 id;
+=09=09u8 task_attr;
+=09=09u8 prio;
+=09=09u8 crn;
+=09=09char cdb[cdb_size];
+=09=09char dataout[];
+=09=09// Write-only part
+=09=09u32 sense_len;
+=09=09u32 residual;
+=09=09u16 status_qualifier;
+=09=09u8 status;
+=09=09u8 response;
+=09=09u8 sense[sense_size];
+=09=09char datain[];
+=09};
+
+
+=09/* command-specific response values */
+=09#define VIRTIO_SCSI_S_OK                0
+=09#define VIRTIO_SCSI_S_OVERRUN           1
+=09#define VIRTIO_SCSI_S_ABORTED           2
+=09#define VIRTIO_SCSI_S_BAD_TARGET        3
+=09#define VIRTIO_SCSI_S_RESET             4
+=09#define VIRTIO_SCSI_S_BUSY              5
+=09#define VIRTIO_SCSI_S_TRANSPORT_FAILURE 6
+=09#define VIRTIO_SCSI_S_TARGET_FAILURE    7
+=09#define VIRTIO_SCSI_S_NEXUS_FAILURE     8
+=09#define VIRTIO_SCSI_S_FAILURE           9
+
+=09/* task_attr */
+=09#define VIRTIO_SCSI_S_SIMPLE            0
+=09#define VIRTIO_SCSI_S_ORDERED           1
+=09#define VIRTIO_SCSI_S_HEAD              2
+=09#define VIRTIO_SCSI_S_ACA               3
+
+The lun field addresses a target and logical unit in the=20
+virtio-scsi device's SCSI domain. The only supported format for=20
+the LUN field is: first byte set to 1, second byte set to target,=20
+third and fourth byte representing a single level LUN structure,=20
+followed by four zero bytes. With this representation, a=20
+virtio-scsi device can serve up to 256 targets and 16384 LUNs per=20
+target.
+
+The id field is the command identifier ("tag").
+
+task_attr, prio and crn should be left to zero. task_attr defines=20
+the task attribute as in the table above, but all task attributes=20
+may be mapped to SIMPLE by the device; crn may also be provided=20
+by clients, but is generally expected to be 0. The maximum CRN=20
+value defined by the protocol is 255, since CRN is stored in an=20
+8-bit integer.
+
+All of these fields are defined in SAM. They are always=20
+read-only, as are the cdb and dataout field. The cdb_size is=20
+taken from the configuration space.
+
+sense and subsequent fields are always write-only. The sense_len=20
+field indicates the number of bytes actually written to the sense=20
+buffer. The residual field indicates the residual size,=20
+calculated as "data_length - number_of_transferred_bytes", for=20
+read or write operations. For bidirectional commands, the=20
+number_of_transferred_bytes includes both read and written bytes.=20
+A residual field that is less than the size of datain means that=20
+the dataout field was processed entirely. A residual field that=20
+exceeds the size of datain means that the dataout field was=20
+processed partially and the datain field was not processed at=20
+all.
+
+The status byte is written by the device to be the status code as=20
+defined in SAM.
+
+The response byte is written by the device to be one of the=20
+following:
+
+  VIRTIO_SCSI_S_OK when the request was completed and the status=20
+  byte is filled with a SCSI status code (not necessarily=20
+  "GOOD").
+
+  VIRTIO_SCSI_S_OVERRUN if the content of the CDB requires=20
+  transferring more data than is available in the data buffers.
+
+  VIRTIO_SCSI_S_ABORTED if the request was cancelled due to an=20
+  ABORT TASK or ABORT TASK SET task management function.
+
+  VIRTIO_SCSI_S_BAD_TARGET if the request was never processed=20
+  because the target indicated by the lun field does not exist.
+
+  VIRTIO_SCSI_S_RESET if the request was cancelled due to a bus=20
+  or device reset (including a task management function).
+
+  VIRTIO_SCSI_S_TRANSPORT_FAILURE if the request failed due to a=20
+  problem in the connection between the host and the target=20
+  (severed link).
+
+  VIRTIO_SCSI_S_TARGET_FAILURE if the target is suffering a=20
+  failure and the guest should not retry on other paths.
+
+  VIRTIO_SCSI_S_NEXUS_FAILURE if the nexus is suffering a failure=20
+  but retrying on other paths might yield a different result.
+
+  VIRTIO_SCSI_S_BUSY if the request failed but retrying on the=20
+  same path should work.
+
+  VIRTIO_SCSI_S_FAILURE for other host or guest error. In=20
+  particular, if neither dataout nor datain is empty, and the=20
+  VIRTIO_SCSI_F_INOUT feature has not been negotiated, the=20
+  request will be immediately returned with a response equal to=20
+  VIRTIO_SCSI_S_FAILURE.=20
+
+2.5.6.6.2 Device Operation: controlq
+------------------------------------
+
+The controlq is used for other SCSI transport operations.=20
+Requests have the following format:
+
+=09struct virtio_scsi_ctrl {
+=09=09u32 type;
+=09...
+=09=09u8 response;
+=09};
+
+=09/* response values valid for all commands */
+=09#define VIRTIO_SCSI_S_OK                       0
+=09#define VIRTIO_SCSI_S_BAD_TARGET               3
+=09#define VIRTIO_SCSI_S_BUSY                     5
+=09#define VIRTIO_SCSI_S_TRANSPORT_FAILURE        6
+=09#define VIRTIO_SCSI_S_TARGET_FAILURE           7
+=09#define VIRTIO_SCSI_S_NEXUS_FAILURE            8
+=09#define VIRTIO_SCSI_S_FAILURE                  9
+=09#define VIRTIO_SCSI_S_INCORRECT_LUN            12
+
+The type identifies the remaining fields.
+
+The following commands are defined:
+
+  Task management function =20
+=09#define VIRTIO_SCSI_T_TMF                      0
+
+=09#define VIRTIO_SCSI_T_TMF_ABORT_TASK           0
+=09#define VIRTIO_SCSI_T_TMF_ABORT_TASK_SET       1
+=09#define VIRTIO_SCSI_T_TMF_CLEAR_ACA            2
+=09#define VIRTIO_SCSI_T_TMF_CLEAR_TASK_SET       3
+=09#define VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET      4
+=09#define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET   5
+=09#define VIRTIO_SCSI_T_TMF_QUERY_TASK           6
+=09#define VIRTIO_SCSI_T_TMF_QUERY_TASK_SET       7
+
+=09struct virtio_scsi_ctrl_tmf
+=09{
+=09=09// Read-only part
+=09=09u32 type;
+=09=09u32 subtype;
+=09=09u8 lun[8];
+=09=09u64 id;
+=09=09// Write-only part
+=09=09u8 response;
+=09}
+
+=09/* command-specific response values */
+=09#define VIRTIO_SCSI_S_FUNCTION_COMPLETE        0
+=09#define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED       10
+=09#define VIRTIO_SCSI_S_FUNCTION_REJECTED        11
+
+  The type is VIRTIO_SCSI_T_TMF; the subtype field defines. All=20
+  fields except response are filled by the driver. The subtype=20
+  field must always be specified and identifies the requested=20
+  task management function.
+
+  Other fields may be irrelevant for the requested TMF; if so,=20
+  they are ignored but they should still be present. The lun=20
+  field is in the same format specified for request queues; the=20
+  single level LUN is ignored when the task management function=20
+  addresses a whole I_T nexus. When relevant, the value of the id=20
+  field is matched against the id values passed on the requestq.
+
+  The outcome of the task management function is written by the=20
+  device in the response field. The command-specific response=20
+  values map 1-to-1 with those defined in SAM.
+
+  Asynchronous notification query =20
+
+=09#define VIRTIO_SCSI_T_AN_QUERY                    1
+
+=09struct virtio_scsi_ctrl_an {
+=09    // Read-only part
+=09    u32 type;
+=09    u8  lun[8];
+=09    u32 event_requested;
+=09    // Write-only part
+=09    u32 event_actual;
+=09    u8  response;
+=09}
+
+=09#define VIRTIO_SCSI_EVT_ASYNC_OPERATIONAL_CHANGE  2
+=09#define VIRTIO_SCSI_EVT_ASYNC_POWER_MGMT          4
+=09#define VIRTIO_SCSI_EVT_ASYNC_EXTERNAL_REQUEST    8
+=09#define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE        16
+=09#define VIRTIO_SCSI_EVT_ASYNC_MULTI_HOST          32
+=09#define VIRTIO_SCSI_EVT_ASYNC_DEVICE_BUSY         64
+
+  By sending this command, the driver asks the device which=20
+  events the given LUN can report, as described in paragraphs 6.6=20
+  and A.6 of the SCSI MMC specification. The driver writes the=20
+  events it is interested in into the event_requested; the device=20
+  responds by writing the events that it supports into=20
+  event_actual.
+
+  The type is VIRTIO_SCSI_T_AN_QUERY. The lun and event_requested=20
+  fields are written by the driver. The event_actual and response=20
+  fields are written by the device.
+
+  No command-specific values are defined for the response byte.
+
+  Asynchronous notification subscription =20
+=09#define VIRTIO_SCSI_T_AN_SUBSCRIBE                2
+
+=09struct virtio_scsi_ctrl_an {
+=09=09// Read-only part
+=09=09u32 type;
+=09=09u8  lun[8];
+=09=09u32 event_requested;
+=09=09// Write-only part
+=09=09u32 event_actual;
+=09=09u8  response;
+=09}
+
+  By sending this command, the driver asks the specified LUN to=20
+  report events for its physical interface, again as described in=20
+  the SCSI MMC specification. The driver writes the events it is=20
+  interested in into the event_requested; the device responds by=20
+  writing the events that it supports into event_actual.
+
+  Event types are the same as for the asynchronous notification=20
+  query message.
+
+  The type is VIRTIO_SCSI_T_AN_SUBSCRIBE. The lun and=20
+  event_requested fields are written by the driver. The=20
+  event_actual and response fields are written by the device.
+
+  No command-specific values are defined for the response byte.
+
+2.5.6.6.3 Device Operation: eventq
+----------------------------------
+
+The eventq is used by the device to report information on logical=20
+units that are attached to it. The driver should always leave a=20
+few buffers ready in the eventq. In general, the device will not=20
+queue events to cope with an empty eventq, and will end up=20
+dropping events if it finds no buffer ready. However, when=20
+reporting events for many LUNs (e.g. when a whole target=20
+disappears), the device can throttle events to avoid dropping=20
+them. For this reason, placing 10-15 buffers on the event queue=20
+should be enough.
+
+Buffers are placed in the eventq and filled by the device when=20
+interesting events occur. The buffers should be strictly=20
+write-only (device-filled) and the size of the buffers should be=20
+at least the value given in the device's configuration=20
+information.
+
+Buffers returned by the device on the eventq will be referred to=20
+as "events" in the rest of this section. Events have the=20
+following format:=20
+
+=09#define VIRTIO_SCSI_T_EVENTS_MISSED   0x80000000
+
+=09struct virtio_scsi_event {
+=09=09// Write-only part
+=09=09u32 event;
+=09=09...
+=09}
+
+If bit 31 is set in the event field, the device failed to report=20
+an event due to missing buffers. In this case, the driver should=20
+poll the logical units for unit attention conditions, and/or do=20
+whatever form of bus scan is appropriate for the guest operating=20
+system.
+
+Other data that the device writes to the buffer depends on the=20
+contents of the event field. The following events are defined:
+
+  No event =20
+=09#define VIRTIO_SCSI_T_NO_EVENT         0
+
+  This event is fired in the following cases:=20
+
+  =E2=80=A2 When the device detects in the eventq a buffer that is=20
+    shorter than what is indicated in the configuration field, it=20
+    might use it immediately and put this dummy value in the=20
+    event field. A well-written driver will never observe this=20
+    situation.
+
+  =E2=80=A2 When events are dropped, the device may signal this event as=
=20
+    soon as the drivers makes a buffer available, in order to=20
+    request action from the driver. In this case, of course, this=20
+    event will be reported with the VIRTIO_SCSI_T_EVENTS_MISSED=20
+    flag.=20
+
+  Transport reset =20
+=09#define VIRTIO_SCSI_T_TRANSPORT_RESET  1
+
+=09struct virtio_scsi_event_reset {
+=09=09// Write-only part
+=09=09u32 event;
+=09=09u8  lun[8];
+=09=09u32 reason;
+=09}
+
+=09#define VIRTIO_SCSI_EVT_RESET_HARD         0
+=09#define VIRTIO_SCSI_EVT_RESET_RESCAN       1
+=09#define VIRTIO_SCSI_EVT_RESET_REMOVED      2
+
+  By sending this event, the device signals that a logical unit=20
+  on a target has been reset, including the case of a new device=20
+  appearing or disappearing on the bus.The device fills in all=20
+  fields. The event field is set to=20
+  VIRTIO_SCSI_T_TRANSPORT_RESET. The lun field addresses a=20
+  logical unit in the SCSI host.
+
+  The reason value is one of the three #define values appearing=20
+  above:
+
+  =E2=80=A2 VIRTIO_SCSI_EVT_RESET_REMOVED ("LUN/target removed") is used=
=20
+    if the target or logical unit is no longer able to receive=20
+    commands.
+
+  =E2=80=A2 VIRTIO_SCSI_EVT_RESET_HARD ("LUN hard reset") is used if the=
=20
+    logical unit has been reset, but is still present.
+
+  =E2=80=A2 VIRTIO_SCSI_EVT_RESET_RESCAN ("rescan LUN/target") is used if=
=20
+    a target or logical unit has just appeared on the device.
+
+  The "removed" and "rescan" events, when sent for LUN 0, may=20
+  apply to the entire target. After receiving them the driver=20
+  should ask the initiator to rescan the target, in order to=20
+  detect the case when an entire target has appeared or=20
+  disappeared. These two events will never be reported unless the=20
+  VIRTIO_SCSI_F_HOTPLUG feature was negotiated between the host=20
+  and the guest.
+
+  Events will also be reported via sense codes (this obviously=20
+  does not apply to newly appeared buses or targets, since the=20
+  application has never discovered them):
+
+  =E2=80=A2 "LUN/target removed" maps to sense key ILLEGAL REQUEST, asc=20
+    0x25, ascq 0x00 (LOGICAL UNIT NOT SUPPORTED)
+
+  =E2=80=A2 "LUN hard reset" maps to sense key UNIT ATTENTION, asc 0x29=20
+    (POWER ON, RESET OR BUS DEVICE RESET OCCURRED)
+
+  =E2=80=A2 "rescan LUN/target" maps to sense key UNIT ATTENTION, asc=20
+    0x3f, ascq 0x0e (REPORTED LUNS DATA HAS CHANGED)
+
+  The preferred way to detect transport reset is always to use=20
+  events, because sense codes are only seen by the driver when it=20
+  sends a SCSI command to the logical unit or target. However, in=20
+  case events are dropped, the initiator will still be able to=20
+  synchronize with the actual state of the controller if the=20
+  driver asks the initiator to rescan of the SCSI bus. During the=20
+  rescan, the initiator will be able to observe the above sense=20
+  codes, and it will process them as if it the driver had=20
+  received the equivalent event.=20
+
+  Asynchronous notification =20
+=09#define VIRTIO_SCSI_T_ASYNC_NOTIFY     2
+
+=09struct virtio_scsi_event_an {
+=09=09// Write-only part
+=09=09u32 event;
+=09=09u8  lun[8];
+=09=09u32 reason;
+=09}
+
+  By sending this event, the device signals that an asynchronous=20
+  event was fired from a physical interface.
+
+  All fields are written by the device. The event field is set to=20
+  VIRTIO_SCSI_T_ASYNC_NOTIFY. The lun field addresses a logical=20
+  unit in the SCSI host. The reason field is a subset of the=20
+  events that the driver has subscribed to via the "Asynchronous=20
+  notification subscription" command.
+
+  When dropped events are reported, the driver should poll for=20
+  asynchronous events manually using SCSI commands.
+
+
+2.6 Reserved Feature Bits
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
+
+Currently there are four device-independent feature bits defined:
+
+  VIRTIO_F_NOTIFY_ON_EMPTY (24) Negotiating this feature=20
+  indicates that the driver wants an interrupt if the device runs=20
+  out of available descriptors on a virtqueue, even though=20
+  interrupts are suppressed using the VRING_AVAIL_F_NO_INTERRUPT=20
+  flag or the used_event field. An example of this is the=20
+  networking driver: it doesn't need to know every time a packet=20
+  is transmitted, but it does need to free the transmitted=20
+  packets a finite time after they are transmitted. It can avoid=20
+  using a timer if the device interrupts it when all the packets=20
+  are transmitted.
+
+  VIRTIO_F_ANY_LAYOUT (27) This feature indicates that the device accepts =
arbitrary
+  descriptor layouts, as described in Section FIXME.
+
+  VIRTIO_F_RING_INDIRECT_DESC (28) Negotiating this feature indicates
+  that the driver can use descriptors with the VRING_DESC_F_INDIRECT
+  flag set, as described in 2.3.3 Indirect Descriptors.
+
+  VIRTIO_F_RING_EVENT_IDX(29) This feature enables the used_event=20
+  and the avail_event fields. If set, it indicates that the=20
+  device should ignore the flags field in the available ring=20
+  structure. Instead, the used_event field in this structure is=20
+  used by guest to suppress device interrupts. Further, the=20
+  driver should ignore the flags field in the used ring=20
+  structure. Instead, the avail_event field in this structure is=20
+  used by the device to suppress notifications. If unset, the=20
+  driver should ignore the used_event field; the device should=20
+  ignore the avail_event field; the flags field is used
+
+
+In addition, bit 30 is used by qemu's implementation to check for experime=
ntal
+early versions of virtio which did not perform correct feature negotiation=
,
+and should not be used.
+
+2.7 virtio_ring.h
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+#ifndef VIRTIO_RING_H
+#define VIRTIO_RING_H
+/* An interface for efficient virtio implementation.
+ *
+ * This header is BSD licensed so anyone can use the definitions
+ * to implement compatible drivers/servers.
+ *
+ * Copyright 2007, 2009, IBM Corporation
+ * Copyright 2011, Red Hat, Inc
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of IBM nor the names of its contributors
+ *    may be used to endorse or promote products derived from this softwar=
e
+ *    without specific prior written permission.
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``A=
S IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURP=
OSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENT=
IAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STR=
ICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY W=
AY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+#include <stdint.h>
+
+/* This marks a buffer as continuing via the next field. */
+#define VRING_DESC_F_NEXT       1
+/* This marks a buffer as write-only (otherwise read-only). */
+#define VRING_DESC_F_WRITE      2
+/* This means the buffer contains a list of buffer descriptors. */
+#define VRING_DESC_F_INDIRECT   4
+
+/* The Host uses this in used->flags to advise the Guest: don't kick me
+ * when you add a buffer.  It's unreliable, so it's simply an
+ * optimization.  Guest will still kick if it's out of buffers. */
+#define VRING_USED_F_NO_NOTIFY  1
+/* The Guest uses this in avail->flags to advise the Host: don't
+ * interrupt me when you consume a buffer.  It's unreliable, so it's
+ * simply an optimization.  */
+#define VRING_AVAIL_F_NO_INTERRUPT      1
+
+/* Support for indirect descriptors */
+#define VIRTIO_RING_F_INDIRECT_DESC    28
+
+/* Support for avail_idx and used_idx fields */
+#define VIRTIO_RING_F_EVENT_IDX        29
+
+/* Arbitrary descriptor layouts. */
+#define VIRTIO_F_ANY_LAYOUT            27
+
+/* Virtio ring descriptors: 16 bytes.
+ * These can chain together via "next". */
+struct vring_desc {
+        /* Address (guest-physical). */
+        uint64_t addr;
+        /* Length. */
+        uint32_t len;
+        /* The flags as indicated above. */
+        uint16_t flags;
+        /* We chain unused descriptors via this, too */
+        uint16_t next;
+};
+
+struct vring_avail {
+        uint16_t flags;
+        uint16_t idx;
+        uint16_t ring[];
+        /* Only if VIRTIO_RING_F_EVENT_IDX: uint16_t used_event; */
+};
+
+/* u32 is used here for ids for padding reasons. */
+struct vring_used_elem {
+        /* Index of start of used descriptor chain. */
+        uint32_t id;
+        /* Total length of the descriptor chain which was written to. */
+        uint32_t len;
+};
+
+struct vring_used {
+        uint16_t flags;
+        uint16_t idx;
+        struct vring_used_elem ring[];
+        /* Only if VIRTIO_RING_F_EVENT_IDX: uint16_t avail_event; */
+};
+
+struct vring {
+        unsigned int num;
+
+        struct vring_desc *desc;
+        struct vring_avail *avail;
+        struct vring_used *used;
+};
+
+/* The standard layout for the ring is a continuous chunk of memory which
+ * looks like this.  We assume num is a power of 2.
+ *
+ * struct vring {
+ *      // The actual descriptors (16 bytes each)
+ *      struct vring_desc desc[num];
+ *
+ *      // A ring of available descriptor heads with free-running index.
+ *      __u16 avail_flags;
+ *      __u16 avail_idx;
+ *      __u16 available[num];
+ *      __u16 used_event_idx; // Only if VIRTIO_RING_F_EVENT_IDX
+ *
+ *      // Padding to the next align boundary.
+ *      char pad[];
+ *
+ *      // A ring of used descriptor heads with free-running index.
+ *      __u16 used_flags;
+ *      __u16 used_idx;
+ *      struct vring_used_elem used[num];
+ *      __u16 avail_event_idx; // Only if VIRTIO_RING_F_EVENT_IDX
+ * };
+ * Note: for virtio PCI, align is 4096.
+ */
+static inline void vring_init(struct vring *vr, unsigned int num, void *p,
+                              unsigned long align)
+{
+        vr->num =3D num;
+        vr->desc =3D p;
+        vr->avail =3D p + num*sizeof(struct vring_desc);
+        vr->used =3D (void *)(((unsigned long)&vr->avail->ring[num] + size=
of(uint16_t)
+                              + align-1)
+                            & ~(align - 1));
+}
+
+static inline unsigned vring_size(unsigned int num, unsigned long align)
+{
+        return ((sizeof(struct vring_desc)*num + sizeof(uint16_t)*(3+num)
+                 + align - 1) & ~(align - 1))
+                + sizeof(uint16_t)*3 + sizeof(struct vring_used_elem)*num;
+}
+
+static inline int vring_need_event(uint16_t event_idx, uint16_t new_idx, u=
int16_t old_idx)
+{
+         return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx -=
 old_idx);=20
+}
+
+/* Get location of event indices (only with VIRTIO_RING_F_EVENT_IDX) */
+static inline uint16_t *vring_used_event(struct vring *vr)
+{
+        /* For backwards compat, used event index is at *end* of avail rin=
g. */
+        return &vr->avail->ring[vr->num];
+}
+
+static inline uint16_t *vring_avail_event(struct vring *vr)
+{
+        /* For backwards compat, avail event index is at *end* of used rin=
g. */
+        return (uint16_t *)&vr->used->ring[vr->num];
+}
+#endif /* VIRTIO_RING_H */
+
+
+
+2.10 Creating New Device Types
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D
+
+Various considerations are necessary when creating a new device=20
+type.
+=20
+2.10.1 How Many Virtqueues?
+---------------------------
+
+It is possible that a very simple device will operate entirely=20
+through its configuration space, but most will need at least one=20
+virtqueue in which it will place requests. A device with both=20
+input and output (eg. console and network devices described here)=20
+need two queues: one which the driver fills with buffers to=20
+receive input, and one which the driver places buffers to=20
+transmit output.
+
+2.10.2 What Configuration Space Layout?
+---------------------------------------
+
+Configuration space should only be used for initialization-time
+parameters.  It is a limited resource with no synchronization, so for
+most uses it is better to use a virtqueue to update configuration=20
+information (the network device does this for filtering,=20
+otherwise the table in the config space could potentially be very=20
+large).
+
+2.10.3 What Device Number?
+--------------------------
+
+Currently device numbers are assigned quite freely: a simple=20
+request mail to the author of this document or the Linux=20
+virtualization mailing list[9] will be sufficient to secure a unique one.
+
+Meanwhile for experimental drivers, use 65535 and work backwards.
+
+2.10.4 How many MSI-X vectors?  (for PCI)
+-----------------------------------------
+
+Using the optional MSI-X capability devices can speed up=20
+interrupt processing by removing the need to read ISR Status=20
+register by guest driver (which might be an expensive operation),=20
+reducing interrupt sharing between devices and queues within the=20
+device, and handling interrupts from multiple CPUs. However, some=20
+systems impose a limit (which might be as low as 256) on the=20
+total number of MSI-X vectors that can be allocated to all=20
+devices. Devices and/or device drivers should take this into=20
+account, limiting the number of vectors used unless the device is=20
+expected to cause a high volume of interrupts. Devices can=20
+control the number of vectors used by limiting the MSI-X Table=20
+Size or not presenting MSI-X capability in PCI configuration=20
+space. Drivers can control this by mapping events to as small=20
+number of vectors as possible, or disabling MSI-X capability=20
+altogether.
+
+2.10.5 Device Improvements
+--------------------------
+
+Any change to configuration space, or new virtqueues, or=20
+behavioural changes, should be indicated by negotiation of a new=20
+feature bit. This establishes clarity[11] and avoids future expansion prob=
lems.
+
+Clusters of functionality which are always implemented together=20
+can use a single bit, but if one feature makes sense without the=20
+others they should not be gratuitously grouped together to=20
+conserve feature bits. We can always extend the spec when the=20
+first person needs more than 24 feature bits for their device.
+
+
+FOOTNOTES:
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+
+[1] This lack of page-sharing implies that the implementation of the=20
+device (e.g. the hypervisor or host) needs full access to the=20
+guest memory. Communication with untrusted parties (i.e.=20
+inter-guest communication) requires copying.
+
+[2] The Linux implementation further separates the PCI virtio code=20
+from the specific virtio drivers: these drivers are shared with=20
+the non-PCI implementations (currently lguest and S/390).
+
+[3] The actual value within this range is ignored
+
+[4] Historically, drivers have used the device before steps 5 and 6.=20
+This is only allowed if the driver does not use any features=20
+which would alter this early use of the device.
+
+[5] ie. once you enable MSI-X on the device, the other fields move.=20
+If you turn it off again, they move back!
+
+[6] The 4096 is based on the x86 page size, but it's also large=20
+enough to ensure that the separate parts of the virtqueue are on=20
+separate cache lines.
+
+[7] These fields are kept here because this is the only part of the=20
+virtqueue written by the device
+
+[8] The Linux drivers do this only for read-only buffers: for=20
+write-only buffers, it is assumed that the driver is merely=20
+trying to keep the receive buffer ring full, and no notification=20
+of this expected condition is necessary.
+
+[9] https://lists.linux-foundation.org/mailman/listinfo/virtualization
+
+[10] It was previously asserted that framing should be independent of mess=
age
+contents, yet invariably drivers layed out messages in reliable ways and
+devices assumed it.
+In addition, the specifications for virtio_blk and virtio_scsi require
+intuiting field lengths from frame boundaries.
+
+[11] Even if it does mean documenting design or implementation=20
+mistakes!
+
+
+[13] It was supposed to indicate segmentation offload support, but=20
+upon further investigation it became clear that multiple bits=20
+were required.
+
+[14] ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are=20
+dependent on VIRTIO_NET_F_CSUM; a dvice which offers the offload=20
+features must offer the checksum feature, and a driver which=20
+accepts the offload features must accept the checksum feature.=20
+Similar logic applies to the VIRTIO_NET_F_GUEST_TSO4 features=20
+depending on VIRTIO_NET_F_GUEST_CSUM.
+
+[15] This is a common restriction in real, older network cards.
+
+[16] For example, a network packet transported between two guests on
+the same system may not require checksumming at all, nor segmentation,
+if both guests are amenable.
+
+[17] For example, consider a partially checksummed TCP (IPv4) packet.=20
+It will have a 14 byte ethernet header and 20 byte IP header=20
+followed by the TCP header (with the TCP checksum field 16 bytes=20
+into that header). csum_start will be 14+20 =3D 34 (the TCP=20
+checksum includes the header), and csum_offset will be 16. The=20
+value in the TCP checksum field should be initialized to the sum=20
+of the TCP pseudo header, so that replacing it by the ones'=20
+complement checksum of the TCP header and body will give the=20
+correct result.
+
+[18] Due to various bugs in implementations, this field is not useful=20
+as a guarantee of the transport header size.
+
+[19] This case is not handled by some older hardware, so is called out=20
+specifically in the protocol.
+
+[20] Note that the header will be two bytes longer for the=20
+VIRTIO_NET_F_MRG_RXBUF case.
+
+[20a] Obviously each one can be split across multiple descriptor=20
+elements.
+
+[21] Since there are no guarentees, it can use a hash filter or
+silently switch to allmulti or promiscuous mode if it is given too
+many addresses.
+
+[22] The SCSI_CMD and SCSI_CMD_OUT types are equivalent, the device=20
+does not distinguish between them.
+
+[23] The FLUSH and FLUSH_OUT types are equivalent, the device does not
+distinguish between them
+
+[25] Because this is high importance and low bandwidth, the current=20
+Linux implementation polls for the buffer to be used, rather than=20
+waiting for an interrupt, simplifying the implementation=20
+significantly. However, for generic serial ports with the=20
+O_NONBLOCK flag set, the polling limitation is relaxed and the=20
+consumed buffers are freed upon the next write or poll call or=20
+when a port is closed or hot-unplugged.
+
+[27] This is historical, and independent of the guest page size
+
+[28] In this case, deflation advice is merely a courtesy
+
+[29] As updates to configuration space are not atomic, this field
+isn't particularly reliable, but can be used to diagnose buggy guests.
--=20
1.8.1.2
 =20
> Reserve device ID 0 (zero) as invalid
> -------------------------------------
>
>                 Key: VIRTIO-7
>                 URL: http://tools.oasis-open.org/issues/browse/VIRTIO-7
>             Project: OASIS Virtual I/O Device (VIRTIO) TC
>          Issue Type: Improvement
>            Reporter: Pawel Moll
>            Assignee: Pawel Moll
>
> Make the virtio device ID 0 reserved (or illegal). This value is illegal =
for PCI devices anyway, and mmio driver can consider such ID as "non presen=
t".

--=20
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: htt=
p://tools.oasis-open.org/issues/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]