<?xml version="1.0" encoding="utf-8"?>
<rss version="0.92">
<channel>
<title>SecuObs.com</title>
<link>http://www.secuobs.com</link>
<description>Observatoire de la securite Internet</description>
<language>fr</language>
<webMaster>webmaster@secuobs.com</webMaster>
 <item><title>Arbitration and Translation, Part 3</title><description>2010-05-07 07:33:46 - A Hole In My Head :    This post is the third in a series You can see the others here, Part 1 and Part 2 What is an Arbiter  In the NT PnP subsystem, an arbiter is an interface that a bus driver can expose which is able to intelligently assign PnP resources of a single specific type  memory, I O ports, DMA channels, interrupts, bus numbers  to its children In general, an arbiter cannot assign resources that it has not claimed from its parent The PnP manager itself exposes five arbiters, one for each type listed above These arbiters are relatively dumb They give out ranges of numbers, with the only criteria being these    Is this range free  If so, you can have it   If the range is already claimed, but with the shareable flag, and your claim is marked shareable, you can have it too   If any part of the range is already claimed as exclusive, you can t have it These arbiters aren t bus-specific, but they don t have to be They re enough to get started Yesterday, I covered the translator interface and what it does The arbiter interface is similar Both are about manipulating the resources for child devices and putting them is less domain-specific terms The difference between a translator and an arbiter is simply that translator interfaces are sufficient when you cannot really change the resources available to a child device and arbiters are necessary when you can Translators are, as you might expect, much simpler HALMPS To illustrate the difference, I want to talk about HALMPS This was a HAL that was shipped as part of Windows NT 35 through Windows Server 2003 It might have even shipped in Server 2008 I don t remember when it got pulled from the tree It ran on machines that conformed to the Intel Multiprocessor Specification, versions 11 through 14 If you re curious, you can find it here That spec has been entirely obsoleted by ACPI MPS was simple where ACPI is very complex But MPS can t describe a machine that changes configuration dynamically at run time while ACPI can As it turns out, this adds a whole lot of complexity MPS describes a system in terms of, among other things, the number of local APICs  which deliver interrupts to processors  and I O APICs  which collect interrupts from devices  It says which pins on which I O APICs each PCI device is connected to This is actually encoded as device-function-IntPin, and HALMPS represents a devices  IRQ  thusly You can see this in Device Manager of a machine running HALMPS The assigned IRQ is just these values all run together This was very confusing to many people, as they might see two devices with the same IRQ in Device Manager, but that just meant that those two devices occupied the same slot on two different buses They might have been sharing interrupts, or they might not The important part of the story here is that the BIOS picked all interrupt-related routing and it was fixed forever at boot There aren t any decisions to make, except one The OS gets to pick which of the processors get targeted by a specific I O APIC input When we were gluing PnP onto the side of the NT driver model, during the development of Windows 2000, the existing scheme for choosing a target processor set for a device s interrupts involved the driver calling HalGetInterruptVector The target IDT entries, the processor set mask and the IRQL for the device all had to be chosen there Furthermore, if two devices shared interrupts, they had to get the same answer, even if one driver was PnP-aware and one made this obsolete call So I left the IRQ-to-IDT mapping code in the HAL If a PnP driver made a resource claim for an interrupt, then that claim would make its way toward the root of the PnP tree  see yesterday s post  link  and it would reach an interrupt translator at the HAL device node The HAL would see the device s claim, do the math on how the interrupt was routed, including which I O APIC and which pin on that I O APIC, and then make an internal call to HalGetInterruptVector, which would choose a target processor set, an IRQL and a vector The target processor set  actually the APIC cluster ID  was then encoded in the upper 24 bits of the  Vector  that the device was assigned and the IDT entry was encoded in the lower 8 bits This was then presented to the root interrupt arbiter within the PnP manager, where it was claimed Just for fun, I fired up a VM running HALMPS and dumped this out in the debugger You can see the relevant parts here  0  kd  translator DEVNODE 83373ee0  HTREE ROOT 0  BusNumber Translator Resources     nt IopTranslatorHandlerCm Requirements  nt IopTranslatorHandlerIo Port Translator Resources     nt IopTranslatorHandlerCm Requirements  nt IopTranslatorHandlerIo Memory Translator Resources     nt IopTranslatorHandlerCm Requirements  nt IopTranslatorHandlerIo Dma Translator Resources     nt IopTranslatorHandlerCm Requirements  nt IopTranslatorHandlerIo Interrupt Translator Resources     nt IopTranslatorHandlerCm Requirements  nt IopTranslatorHandlerIo DEVNODE 8336f3c0  PCI_HAL PNP0A03 0  Interrupt Translator Resources     hal HalpIrqTranslateResourcesPci Requirements  hal HalpIrqTranslateRequirementsPci    DEVNODE 833ba008  PCI VEN_8086 DEV_7110 SUBSYS_00000000 REV_01 2 ebb567f 0 38  Interrupt Translator Resources     hal HalIrqTranslateResourcesIsa Requirements  hal HalIrqTranslateResourceRequirementsIsa DEVNODE 833baee0  PCI VEN_8086 DEV_7111 SUBSYS_00000000 REV_01 2 ebb567f 0 39  Interrupt Translator Resources     hal HalIrqTranslateResourcesIsa Requirements  hal HalIrqTranslateResourceRequirementsIsa 0  kd  arbiter  Interrupt Arbiter  RootIRQ  at 808a5620 Allocated ranges  0000000000000000 - 0000000000000000   B   833c0988   Driver PCI_HAL  0000000000000001 - 0000000000000001   B   833c0988   Driver PCI_HAL   some lines omitted for brevity  000000000000002f - 000000000000002f   B   833c0988   Driver PCI_HAL  00000000000000ff - 00000000000000ff   B   833c0988   Driver PCI_HAL          0000000000000151 - 0000000000000151       83373250   atapi  0000000000000152 - 0000000000000152   B   8336df10  0000000000000161 - 0000000000000161     0000000000000161 - 0000000000000161  CB   833c0988   Driver PCI_HAL  0000000000000161 - 0000000000000161  CB   8336d838   Serial  0000000000000172 - 0000000000000172       8336da80   i8042prt  0000000000000181 - 0000000000000181       8336d5f0   Serial  0000000000000182 - 0000000000000182       833bd1c0   i8042prt  0000000000000192 - 0000000000000192       8336d3a8   fdc  00000000000001a2 - 00000000000001a2       83373030   atapi  00000000000001b1 - 00000000000001b1     00000000000001b1 - 00000000000001b1  CB   833c0988   Driver PCI_HAL  00000000000001b1 - 00000000000001b1  CB   833bdf10  00000000000001b2 - 00000000000001b2   B   833bd408 HALMACPI Now let s contrast that with HALMACPI This is the HAL that runs  to this day  on any machine that conforms to the ACPI spec and has more than one processor, which is nearly anything you can go out and buy The ACPI spec says a few things about interrupts    There are a discrete number of I O APICs and their base addresses are listed in ACPI tables   ISAPnP- or ACPI-enumerated devices are attached to I O APIC inputs and those attachments are described in the ACPI namespace under each device A device can be moved from one input to another by invoking the _SRS method under the device   PCI devices are either directly attached to I O APIC inputs or they are attached to IRQ steering  link nodes  which themselves can be attached to one of a set of I O APIC inputs The set of possible attachments is described under the link node  which is itself sort of a device  in the ACPI namespace The exact pin that they are attached to can be changed by invoking the link node s _SRS method This is entirely different from HALMPS Now we have a choice about how devices are routed, if the motherboard designer designs the board that way and if the BIOS guy exposes the functionality If we want to move one or a group of PCI devices from one IRQ to another, we can I put an interrupt arbiter in the ACPI driver, as that was where it was possible, or at least easy, to interact with all the various parts of the ACPI namespace An arbiter gets requests like   Here s a set of four devices, each of which has a fairly complex set of possible interrupt assignments Please find the optimal configuration which satisfies all the requirements When a device needs I O ports, memory ranges and interrupts, these requests get made by the PnP manager to each type of arbiter simultaneously If a fit can be found, all the device eventually get IRP_MN_START_DEVICE with a resource set that meets their needs Note that this problem is NP-complete So we don t look at every possible solution There are a bunch of heuristics about which parts of the solution space to look at first and how long to spend looking In truth, the NT PnP team came to a fairly painful conclusion after a couple of years of tweaking these algorithms  It was painful mostly because it took so long to fully understand the situation  The first major truth is that you can no longer add a truly new bus architecture to a PC because Windows 95  and now many other OSes  only understood PCI At the point that a largely-deployed OS that did PnP natively existed, every machine had to expose the interfaces that that OS understood Thus we have HyperTransport, PCI Express and lots of internal bus architectures that never got widely published, all of which pretend to be PCI at a PnP level so that they work with old OSes which do PnP natively The kicker is that all of those, particularly the chipset-internal ones, have deviations from the PCI spec I ve sat in meetings with chipset designers who said that their devices didn t have to be PCI-compliant because they were inside of a chipset From a hardware guy s perspective, this makes perfect sense It doesn t have PCI pins, it doesn t have any PCI logic, so it isn t PCI But, for various reasons, it does have a PCI configuration space When I point out to them that there s no way for the OS to differentiate between these  non-PCI PCI devices  and real PCI devices, they shrug and say that s not their problem, since the BIOS sets it all up right anyhow And that s the second major truth The BIOS sets most or all of it up anyhow So the arbiter interface and NT PnP, in general, have a way of asking about how a device was configured by the BIOS When a device is first discovered, the PnP manager sends IRP_MN_QUERY_RESOURCES This IRP asks the question  what resources is this device using, right now  The PCI driver will look at a device s Base Address Registers and its Interrupt Line register and send that claim back in response The PnP manager then calls into the relevant arbiters with the device s PDO  or a proxy PDO if the driver an NT4-style non-PnP driver  and claim those ranges unconditionally for the device, with a flag saying that this is a  boot reservation  See the  B  in some lines of the debugger dump above, and you ll see these boot claims When the device stack for the device is being built, the PnP manager sends IRP_MN_QUERY_RESOURCE_REQUIREMENTS to ask  what are the set of all possible sets of resources this device could use  And once the FDO and filters have been loaded, it sends IRP_MN_FILTER_RESOURCE_REQUIREMENTS to ask  what modifications would you like to make to this claim that the bus driver has generated on your behalf  The resulting claim set is sent to the arbiters Now those arbiters know what resources the device booted with, if the device was present in the machine at boot time So they, for the most part, just choose what the BIOS chose This is what makes slippery chipsets work just fine The BIOS is the expert and NT leaves that alone Some resource types don t work this way Most notably, there s no notion of which IRQ a device was connected to at boot time if your machine is running with the APIC enabled The BIOS only configures the IRQ routing for the PIC  not APIC  interrupt controller, in preparation for running Windows 98, which never supported APICs So the ACPI IRQ arbiter, when running on an APIC system, throws away the boot claims Note that the boot claim system has some interesting properties There may be conflicts, and sometimes that s okay BIOSes tend to make claims for ACPI-enumerated dummy devices like  Motherboard Resources  when there is a device which must claim some I O ports but which mostly doesn t ever get a driver loaded The most famous example of this tends to be an SMBus controller Most machines don t run a driver on it, but the BIOS needs to access it in System Management Mode So it will claim the ports Sometimes, people write drivers for them and then those driver show up as a conflict with a boot claim This is mostly benign Message-Signaled Interrupts Interrupt arbitration tends to be the most complicated part of the system Or, at least, it seems that way to me, since I m still messing around with it almost fifteen years after I first began Most of the other arbiters haven t changed much in years beginning with a  2  Devices which can generate Message-Signaled Interrupts don t need to use an I O APIC input But they can, usually, also use one, particularly if the OS in question doesn t understand MSI With MSI, the interrupt is sent by doing a short busmaster burst involving 32-bits of data to a special address The device need not understand the address nor the data It just gets told, when you want to trigger this interrupt, send this blob here The PCI Spec has taken two passes at defining how this should be configured in a device, both of which have proved insufficient for representing the problem at hand  MSI  was introduced in PCI 22 and it involved writing a single address into the device, and a single data value too If the device wanted to send more than one interrupt, it could vary N low-order bits of the data value, at the OS s discretion This meant that the data values were constrained to a naturally aligned range of values, and that range was a power of 2 in length  See the PCI Spec for the scary details  Given the way Intel defined the special address data format in the Software Developer s Manual, Volume 3, Chapter 8, Section 11  http wwwintelcom products processor manuals  the address determine the target processor set This means that MSI  as defined in PCI 22  can only work if every interrupt targets the same processor or set of processors You can t choose to send one interrupt message to one processor and one to another Thus MSI-X was defined in PCI 30 Both still exist, and they ve been carried into PCI-X and PCI Express MSI-X allows each interrupt message to have separate address and data values It also allows as many as 2048 messages per PCI function Given that the processor-set-to-address mapping was fixed by Intel, virtualization and large numbers of cores is forcing another level of indirection through I O MMUs, called  VT-d  by Intel and  IOMMU  by AMD The fundamental problem here is that the PCI spec never should have tried to define message-signaled interrupts at all They just don t have anything to do with the PCI bus Every interesting thing about them is external to the PCI bus  Full disclosure  I didn t always understand this, and I sat on the committee that defined MSI-X  The only thing the PCI spec allows you to do is to have a defined mechanism for telling the device to target a busmaster transaction to a specific address with specific data when the device needs attention There s no standard mechanism for telling a PCI NIC to send your network data to a specific address, as that s just part of the definition of the device behavior You don t want to standardize that because it removes degrees of freedom when you want to do it differently in the future There shouldn t be one for interrupts, either, on exactly the same grounds I ll quit ranting now What you really need is a way to say, for example,  my device needs to trigger 36 interrupts, two-per core in this 16-core machine, plus four more for various housekeeping tasks  That s not really expressible in the PCI capability structs which define MSI and MSI-X, but it is expressible inside of Windows Once the PnP manager has assigned IDT vectors, IRQLs, target processors and the lot, you need a way of programming these into the device This is expressible in the PCI spec, though it s redundant in my mind Whether the bus driver does it or the function driver does it doesn t matter much Mechanically, it works like this  1 The PnP manager sends IRP_MN_QUERY_RESOURCE_REQUIREMENTS The PCI driver reads the various capability structs and some registry keys that were set during INF processing  since, as we saw above, the PCI spec can t express everything necessary  and responds to this IRP with some interrupt claims Typically, there will be three possibilities expressed in the resultant IO Resource Requirements List  lots of message-signaled interrupts, one message-signaled interrupt and, lastly, one line-based interrupt 2 The PnP manager builds the rest of the device stack and sends IRP_MN_FILTER_RESOURCE_REQUIREMENTS If the device is trying really hard to squeeze out performance by targeting specific interrupts at specific processors, the FDO  usually NDIS or storport, along with the miniport  will  filter  that claim to affinitize certain interrupts to certain cores, and possibly to cut down the total number of messages in the first claim to some multiple of the number of cores actually installed 3 The PnP manager passes these sets of claims to the interrupt arbiter in the ACPI driver, which looks at them and tries to satisfy them in the order that they re listed If there are enough free IDT entries  and the underlying processor and chipset support MSI at all  then the first claim gets satisfied If not, it goes for the single message claim If that can t be satisfied, it will back off to the line-based interrupt, which is usually shared with something else and will almost certainly succeed 4 The PnP manager translates these resources down to the bus terms  See yesterday s post  This involves changing these vector and target processor sets into addresses and data again These values end up in your interrupt resources in your raw resource list 5 The PnP manager translates these  up  into processor-relative terms This populates the translated resource list with Vector, Level and Affinity for each interrupt message 6 The PnP manager sends IRP_MN_START_DEVICE with both lists The PCI driver sees the IRP first  since the FDO handles start on the way up, remember  and programs the MSI or MSI-X capability structures, if they exist The FDO sees the IRP next, and stores the information for calling IoConnectInterruptEx It may use the raw resources to derive address and data values if it likes ACPI IRQ Arbiter Dumps The ACPI IRQ arbiter handles all this by considering a list of things simultaneously   Free IDT entries on all the potential cores   Free I O APIC inputs for devices which have some flexibility   Whether MSI is available in the processor and the chipset   Whether the device has an MSI request Since that arbiter is looking across a couple of dimensions simultaneously, dumping it is a little more complicated The default debugger command  arbiter  will show you the IRQ claims  acpiirqarb  will show you the other state I ll walk through these dumps below This first dump is of the default arbiter in the PnP manager It says that lots of vectors are reserved for internal use and lots of vectors are assigned to ACPI  across every core  for redistribution to other devices 0  kd  arbiter 4 DEVNODE fffffa8003c49d90  HTREE ROOT 0  Interrupt Arbiter  RootIRQ  at fffff800014aebc0 Allocated ranges  0000000000000000 - 0000000000000000   B   fffffa8003c48e30  0000000000000001 - 0000000000000001   B   fffffa8003c48e30   lines omitted  000000000000003f - 000000000000003f   B   fffffa8003c48e30  0000000000000051 - 0000000000000051       fffffa8003c4dbd0   ACPI   lines omitted  00000000000000bd - 00000000000000bd       fffffa8003c4dbd0   ACPI  00000000000000be - 00000000000000be       fffffa8003c4dbd0   ACPI  00000000000000ff - 00000000000000ff   B   fffffa8003c48e30  Possible allocation   This next dump is of the arbiter state of the ACPI IRQ arbiter It s just the  IRQ  part, as that s what s done in terms that  arbiter can interpret DEVNODE fffffa8003c394a0  ACPI_HAL PNP0C08 0  Interrupt Arbiter  ACPI_IRQ  at fffff880010fdfc0 Allocated ranges  0000000000000002 - 0000000000000002   B   fffffa80039a05c0  0000000000000008 - 0000000000000008       fffffa80039a0c20  0000000000000009 - 0000000000000009 S     fffffa8003c4dbd0   ACPI  000000000000000d - 000000000000000d   B   fffffa80039a07e0  0000000000000010 - 0000000000000010 S   0000000000000010 - 0000000000000010 S     fffffa800399b060   pciide  0000000000000010 - 0000000000000010 S     fffffa80039b3a20   usbuhci  0000000000000012 - 0000000000000012 S   0000000000000012 - 0000000000000012 S     fffffa80039ab060   pciide  0000000000000012 - 0000000000000012 S     fffffa80039b1060   usbehci  0000000000000012 - 0000000000000012 S     fffffa80039ae060   usbuhci  0000000000000013 - 0000000000000013 S   0000000000000013 - 0000000000000013 S     fffffa80039ac060   pciide  0000000000000013 - 0000000000000013 S     fffffa80039b2a20   usbuhci  0000000000000013 - 0000000000000013 S     fffffa80039afa20   usbuhci  0000000000000015 - 0000000000000015 S     fffffa80039b2060   usbuhci  0000000000000017 - 0000000000000017 S   0000000000000017 - 0000000000000017 S     fffffa80039aea20   usbehci  0000000000000017 - 0000000000000017 S     fffffa80039af060   usbuhci  00000000fffffff9 - 00000000fffffff9       fffffa8003997a20   pci  00000000fffffffa - 00000000fffffffa       fffffa80039b0060   pci  00000000fffffffb - 00000000fffffffb       fffffa80039b1a20   pci  00000000fffffffc - 00000000fffffffc       fffffa80039b0a20   pci  00000000fffffffd - 00000000fffffffd       fffffa80039b7a20   pci  00000000fffffffe - 00000000fffffffe       fffffa80039b7060   pci  Possible allocation   The large numbers for IRQs are placeholders for MSI assignments, which in this machine are all PCI Express root ports  acpiirqarb tells us about the other internal arbiter state, including IDT assignments on every core and state of the ACPI link nodes, which exist but aren t used in APIC mode in this machine It also details all the I O APICs in the machine, including the metadata on all the inputs The  not on bus  claims are interesting They re the inverse of the IDT entries that got claimed above in the root arbiter It means, essentially, that ACPI can t give them out because it doesn t own them 0  kd  acpiirqarb Processor 0  0, 0  Device Object  0000000000000000 Current IDT Allocation  0000000000000000 - 0000000000000050       00000000   A 0000000000000000 IRQ 0 0000000000000061 - 0000000000000061 S   0000000000000061 - 0000000000000061 S B   fffffa80039ac060   pciide   A fffff8a00189b230 IRQ 13 0000000000000061 - 0000000000000061 S B   fffffa80039b2a20   usbuhci   A fffff8a0018d8e90 IRQ 13 0000000000000061 - 0000000000000061 S B   fffffa80039afa20   usbuhci   A fffff8a0001986b0 IRQ 13 0000000000000081 - 0000000000000081   D   fffffa80039a05c0   A fffff8a0018f74e0 IRQ 2 00000000000000a0 - 00000000000000a1   D   fffffa80039b7a20   pci   A fffff8a0017eaa10 IRQ fffffffd 00000000000000a2 - 00000000000000a2 S B   fffffa80039b2060   usbuhci   A fffff8a0006cc860 IRQ 15 00000000000000b1 - 00000000000000b1 S B   fffffa8003c4dbd0   ACPI   A fffff8a0001120f0 IRQ 9 00000000000000bf - ffffffffffffffff       00000000   A 0000000000000000 IRQ 10 Possible IDT Allocation   Processor 1  0, 1  Device Object  0000000000000000 Current IDT Allocation  0000000000000000 - 0000000000000050       00000000   A 0000000000000000 IRQ 0 0000000000000061 - 0000000000000061 S   0000000000000061 - 0000000000000061 S B   fffffa80039ac060   pciide   A fffff8a00189b230 IRQ 13 0000000000000061 - 0000000000000061 S B   fffffa80039b2a20   usbuhci   A fffff8a0018d8e90 IRQ 13 0000000000000061 - 0000000000000061 S B   fffffa80039afa20   usbuhci   A fffff8a0001986b0 IRQ 13 0000000000000081 - 0000000000000081   D   fffffa80039a05c0   A fffff8a0018f74e0 IRQ 2 00000000000000a0 - 00000000000000a1   D   fffffa80039b7a20   pci   A fffff8a0017eaa10 IRQ fffffffd 00000000000000a2 - 00000000000000a2 S B   fffffa80039b2060   usbuhci   A fffff8a0006cc860 IRQ 15 00000000000000b1 - 00000000000000b1 S B   fffffa8003c4dbd0   ACPI   A fffff8a0001120f0 IRQ 9 00000000000000bf - ffffffffffffffff       00000000   A 0000000000000000 IRQ 0 Possible IDT Allocation   Processor 2  0, 2  Device Object  0000000000000000 Current IDT Allocation  0000000000000000 - 0000000000000050       00000000   A 0000000000000000 IRQ 0 0000000000000061 - 0000000000000061 S   0000000000000061 - 0000000000000061 S B   fffffa80039ac060   pciide   A fffff8a00189b230 IRQ 13 0000000000000061 - 0000000000000061 S B   fffffa80039b2a20   usbuhci   A fffff8a0018d8e90 IRQ 13 0000000000000061 - 0000000000000061 S B   fffffa80039afa20   usbuhci   A fffff8a0001986b0 IRQ 13 0000000000000081 - 0000000000000081   D   fffffa80039a05c0   A fffff8a0018f74e0 IRQ 2 00000000000000a0 - 00000000000000a1   D   fffffa80039b7a20   pci   A fffff8a0017eaa10 IRQ fffffffd 00000000000000a2 - 00000000000000a2 S B   fffffa80039b2060   usbuhci   A fffff8a0006cc860 IRQ 15 00000000000000b1 - 00000000000000b1 S B   fffffa8003c4dbd0   ACPI   A fffff8a0001120f0 IRQ 9 00000000000000bf - ffffffffffffffff       00000000   A 0000000000000000 IRQ 0 Possible IDT Allocation   Processor 3  0, 3  Device Object  0000000000000000 Current IDT Allocation  0000000000000000 - 0000000000000050       00000000   A 0000000000000000 IRQ 0 0000000000000061 - 0000000000000061 S   0000000000000061 - 0000000000000061 S B   fffffa80039ac060   pciide   A fffff8a00189b230 IRQ 13 0000000000000061 - 0000000000000061 S B   fffffa80039b2a20   usbuhci   A fffff8a0018d8e90 IRQ 13 0000000000000061 - 0000000000000061 S B   fffffa80039afa20   usbuhci   A fffff8a0001986b0 IRQ 13 0000000000000081 - 0000000000000081   D   fffffa80039a05c0   A fffff8a0018f74e0 IRQ 2 00000000000000a0 - 00000000000000a1   D   fffffa80039b7a20   pci   A fffff8a0017eaa10 IRQ fffffffd 00000000000000a2 - 00000000000000a2 S B   fffffa80039b2060   usbuhci   A fffff8a0006cc860 IRQ 15 00000000000000b1 - 00000000000000b1 S B   fffffa8003c4dbd0   ACPI   A fffff8a0001120f0 IRQ 9 00000000000000bf - ffffffffffffffff       00000000   A 0000000000000000 IRQ 0 Possible IDT Allocation   Processor 4  0, 4  Device Object  0000000000000000 Current IDT Allocation  0000000000000000 - 0000000000000050       00000000   A 0000000000000000 IRQ 0 0000000000000051 - 0000000000000051 S   0000000000000051 - 0000000000000051 S B   fffffa80039ab060   pciide   A fffff8a0006eaa10 IRQ 12 0000000000000051 - 0000000000000051 S B   fffffa80039b1060   usbehci   A fffff8a0006d83e0 IRQ 12 0000000000000051 - 0000000000000051 S B   fffffa80039ae060   usbuhci   A fffff8a0006c5a80 IRQ 12 0000000000000090 - 0000000000000091   D   fffffa8003997a20   pci   A fffff8a0006edeb0 IRQ fffffff9 00000000000000a0 - 00000000000000a0   D   fffffa80039b0060   pci   A fffff8a001909750 IRQ fffffffa 00000000000000a1 - 00000000000000a1   D   fffffa80039a0c20   A fffff8a00187f840 IRQ 8 00000000000000b0 - 00000000000000b0   D   fffffa80039b0a20   pci   A fffff8a001890b50 IRQ fffffffc 00000000000000bf - ffffffffffffffff       00000000   A 0000000000000000 IRQ 0 Possible IDT Allocation   Processor 5  0, 5  Device Object  0000000000000000 Current IDT Allocation  0000000000000000 - 0000000000000050       00000000   A 0000000000000000 IRQ 0 0000000000000051 - 0000000000000051 S   0000000000000051 - 0000000000000051 S B   fffffa80039ab060   pciide   A fffff8a0006eaa10 IRQ 12 0000000000000051 - 0000000000000051 S B   fffffa80039b1060   usbehci   A fffff8a0006d83e0 IRQ 12 0000000000000051 - 0000000000000051 S B   fffffa80039ae060   usbuhci   A fffff8a0006c5a80 IRQ 12 0000000000000090 - 0000000000000091   D   fffffa8003997a20   pci   A fffff8a0006edeb0 IRQ fffffff9 00000000000000a0 - 00000000000000a0   D   fffffa80039b0060   pci   A fffff8a001909750 IRQ fffffffa 00000000000000a1 - 00000000000000a1   D   fffffa80039a0c20   A fffff8a00187f840 IRQ 8 00000000000000b0 - 00000000000000b0   D   fffffa80039b0a20   pci   A fffff8a001890b50 IRQ fffffffc 00000000000000bf - ffffffffffffffff       00000000   A 0000000000000000 IRQ 0 Possible IDT Allocation   Processor 6  0, 6  Device Object  0000000000000000 Current IDT Allocation  0000000000000000 - 0000000000000050       00000000   A 0000000000000000 IRQ 0 0000000000000051 - 0000000000000051 S   0000000000000051 - 0000000000000051 S B   fffffa80039ab060   pciide   A fffff8a0006eaa10 IRQ 12 0000000000000051 - 0000000000000051 S B   fffffa80039b1060   usbehci   A fffff8a0006d83e0 IRQ 12 0000000000000051 - 0000000000000051 S B   fffffa80039ae060   usbuhci   A fffff8a0006c5a80 IRQ 12 0000000000000090 - 0000000000000091   D   fffffa8003997a20   pci   A fffff8a0006edeb0 IRQ fffffff9 00000000000000a0 - 00000000000000a0   D   fffffa80039b0060   pci   A fffff8a001909750 IRQ fffffffa 00000000000000a1 - 00000000000000a1   D   fffffa80039a0c20   A fffff8a00187f840 IRQ 8 00000000000000b0 - 00000000000000b0   D   fffffa80039b0a20   pci   A fffff8a001890b50 IRQ fffffffc 00000000000000bf - ffffffffffffffff       00000000   A 0000000000000000 IRQ 0 Possible IDT Allocation   Processor 7  0, 7  Device Object  0000000000000000 Current IDT Allocation  0000000000000000 - 0000000000000050       00000000   A 0000000000000000 IRQ 0 0000000000000051 - 0000000000000051 S   0000000000000051 - 0000000000000051 S B   fffffa80039ab060   pciide   A fffff8a0006eaa10 IRQ 12 0000000000000051 - 0000000000000051 S B   fffffa80039b1060   usbehci   A fffff8a0006d83e0 IRQ 12 0000000000000051 - 0000000000000051 S B   fffffa80039ae060   usbuhci   A fffff8a0006c5a80 IRQ 12 0000000000000090 - 0000000000000091   D   fffffa8003997a20   pci   A fffff8a0006edeb0 IRQ fffffff9 00000000000000a0 - 00000000000000a0   D   fffffa80039b0060   pci   A fffff8a001909750 IRQ fffffffa 00000000000000a1 - 00000000000000a1   D   fffffa80039a0c20   A fffff8a00187f840 IRQ 8 00000000000000b0 - 00000000000000b0   D   fffffa80039b0a20   pci   A fffff8a001890b50 IRQ fffffffc 00000000000000bf - ffffffffffffffff       00000000   A 0000000000000000 IRQ 0 Possible IDT Allocation   Processor 8  0, 8  Device Object  0000000000000000 Current IDT Allocation  0000000000000000 - 0000000000000050       00000000   A 0000000000000000 IRQ 0 0000000000000071 - 0000000000000071 S   0000000000000071 - 0000000000000071 S B   fffffa800399b060   pciide   A fffff8a000154d20 IRQ 10 0000000000000071 - 0000000000000071 S B   fffffa80039b3a20   usbuhci   A fffff8a0000a0b20 IRQ 10 0000000000000091 - 0000000000000091   D   fffffa80039a07e0   A fffff8a00193a8a0 IRQ d 00000000000000a0 - 00000000000000a0   D   fffffa80039b1a20   pci   A fffff8a00193a870 IRQ fffffffb 00000000000000b0 - 00000000000000b1   D   fffffa80039b7060   pci   A fffff8a00197a3e0 IRQ fffffffe 00000000000000b2 - 00000000000000b2 S   00000000000000b2 - 00000000000000b2 S B   fffffa80039aea20   usbehci   A fffff8a00197a3b0 IRQ 17 00000000000000b2 - 00000000000000b2 S B   fffffa80039af060   usbuhci   A fffff8a0011c5460 IRQ 17 00000000000000bf - ffffffffffffffff       00000000   A 0000000000000000 IRQ 0 Possible IDT Allocation   Processor 9  0, 9  Device Object  0000000000000000 Current IDT Allocation  0000000000000000 - 0000000000000050       00000000   A 0000000000000000 IRQ 0 0000000000000071 - 0000000000000071 S   0000000000000071 - 0000000000000071 S B   fffffa800399b060   pciide   A fffff8a000154d20 IRQ 10 0000000000000071 - 0000000000000071 S B   fffffa80039b3a20   usbuhci   A fffff8a0000a0b20 IRQ 10 0000000000000091 - 0000000000000091   D   fffffa80039a07e0   A fffff8a00193a8a0 IRQ d 00000000000000a0 - 00000000000000a0   D   fffffa80039b1a20   pci   A fffff8a00193a870 IRQ fffffffb 00000000000000b0 - 00000000000000b1   D   fffffa80039b7060   pci   A fffff8a00197a3e0 IRQ fffffffe 00000000000000b2 - 00000000000000b2 S   00000000000000b2 - 00000000000000b2 S B   fffffa80039aea20   usbehci   A fffff8a00197a3b0 IRQ 17 00000000000000b2 - 00000000000000b2 S B   fffffa80039af060   usbuhci   A fffff8a0011c5460 IRQ 17 00000000000000bf - ffffffffffffffff       00000000   A 0000000000000000 IRQ 0 Possible IDT Allocation   Processor 10  0, 10  Device Object  0000000000000000 Current IDT Allocation  0000000000000000 - 0000000000000050       00000000   A 0000000000000000 IRQ 0 0000000000000071 - 0000000000000071 S   0000000000000071 - 0000000000000071 S B   fffffa800399b060   pciide   A fffff8a000154d20 IRQ 10 0000000000000071 - 0000000000000071 S B   fffffa80039b3a20   usbuhci   A fffff8a0000a0b20 IRQ 10 0000000000000091 - 0000000000000091   D   fffffa80039a07e0   A fffff8a00193a8a0 IRQ d 00000000000000a0 - 00000000000000a0   D   fffffa80039b1a20   pci   A fffff8a00193a870 IRQ fffffffb 00000000000000b0 - 00000000000000b1   D   fffffa80039b7060   pci   A fffff8a00197a3e0 IRQ fffffffe 00000000000000b2 - 00000000000000b2 S   00000000000000b2 - 00000000000000b2 S B   fffffa80039aea20   usbehci   A fffff8a00197a3b0 IRQ 17 00000000000000b2 - 00000000000000b2 S B   fffffa80039af060   usbuhci   A fffff8a0011c5460 IRQ 17 00000000000000bf - ffffffffffffffff       00000000   A 0000000000000000 IRQ 0 Possible IDT Allocation   Processor 11  0, 11  Device Object  0000000000000000 Current IDT Allocation  0000000000000000 - 0000000000000050       00000000   A 0000000000000000 IRQ 0 0000000000000071 - 0000000000000071 S   0000000000000071 - 0000000000000071 S B   fffffa800399b060   pciide   A fffff8a000154d20 IRQ 10 0000000000000071 - 0000000000000071 S B   fffffa80039b3a20   usbuhci   A fffff8a0000a0b20 IRQ 10 0000000000000091 - 0000000000000091   D   fffffa80039a07e0   A fffff8a00193a8a0 IRQ d 00000000000000a0 - 00000000000000a0   D   fffffa80039b1a20   pci   A fffff8a00193a870 IRQ fffffffb 00000000000000b0 - 00000000000000b1   D   fffffa80039b7060   pci   A fffff8a00197a3e0 IRQ fffffffe 00000000000000b2 - 00000000000000b2 S   00000000000000b2 - 00000000000000b2 S B   fffffa80039aea20   usbehci   A fffff8a00197a3b0 IRQ 17 00000000000000b2 - 00000000000000b2 S B   fffffa80039af060   usbuhci   A fffff8a0011c5460 IRQ 17 00000000000000bf - ffffffffffffffff       00000000   A 0000000000000000 IRQ 0 Possible IDT Allocation   Interrupt Controller  Inputs  0x0-0x17  Dev  0000000000000000   00 Cur IDT-00 Ref-0 edg hi   Pos IDT-00 Ref-0 edg hi   01 Cur IDT-00 Ref-0 edg hi   Pos IDT-00 Ref-0 edg hi   02 Cur IDT-81 Ref-1 edg hi   Pos IDT-00 Ref-0 edg hi   03 Cur IDT-00 Ref-0 edg hi   Pos IDT-00 Ref-0 edg hi   04 Cur IDT-00 Ref-0 edg hi   Pos IDT-00 Ref-0 edg hi   05 Cur IDT-00 Ref-0 edg hi   Pos IDT-00 Ref-0 edg hi   06 Cur IDT-00 Ref-0 edg hi   Pos IDT-00 Ref-0 edg hi   07 Cur IDT-00 Ref-0 edg hi   Pos IDT-00 Ref-0 edg hi   08 Cur IDT-a1 Ref-1 edg hi   Pos IDT-00 Ref-0 edg hi   09 Cur IDT-b1 Ref-1 lev hi   Pos IDT-00 Ref-0 edg hi   0a Cur IDT-00 Ref-0 edg hi   Pos IDT-00 Ref-0 edg hi   0b Cur IDT-00 Ref-0 edg hi   Pos IDT-00 Ref-0 edg hi   0c Cur IDT-00 Ref-0 edg hi   Pos IDT-00 Ref-0 edg hi   0d Cur IDT-91 Ref-1 edg hi   Pos IDT-00 Ref-0 edg hi   0e Cur IDT-00 Ref-0 edg hi   Pos IDT-00 Ref-0 edg hi   0f Cur IDT-00 Ref-0 edg hi   Pos IDT-00 Ref-0 edg hi   10 Cur IDT-71 Ref-2 lev low  Pos IDT-00 Ref-0 edg hi   11 Cur IDT-00 Ref-0 edg hi   Pos IDT-00 Ref-0 edg hi   12 Cur IDT-51 Ref-3 lev low  Pos IDT-00 Ref-0 edg hi   13 Cur IDT-61 Ref-3 lev low  Pos IDT-00 Ref-0 edg hi   14 Cur IDT-00 Ref-0 edg hi   Pos IDT-00 Ref-0 edg hi   15 Cur IDT-a2 Ref-1 lev low  Pos IDT-00 Ref-0 edg hi   16 Cur IDT-00 Ref-0 edg hi   Pos IDT-00 Ref-0 edg hi   17 Cur IDT-b2 Ref-2 lev low  Pos IDT-00 Ref-0 edg hi  Link Node  LNKA Current   IRQ  0x0 - 0 reference s  Possible  IRQ  0x0 - 0 reference s  Preferred IRQ  0xffffffff - ResourceOverride  IO_List  0000000000000000 Link Node  LNKB Current   IRQ  0x0 - 0 reference s  Possible  IRQ  0x0 - 0 reference s  Preferred IRQ  0xffffffff - ResourceOverride  IO_List  0000000000000000 Link Node  LNKC Current   IRQ  0x0 - 0 reference s  Possible  IRQ  0x0 - 0 reference s  Preferred IRQ  0xffffffff - ResourceOverride  IO_List  0000000000000000 Link Node  LNKD Current   IRQ  0x0 - 0 reference s  Possible  IRQ  0x0 - 0 reference s  Preferred IRQ  0xffffffff - ResourceOverride  IO_List  0000000000000000 Link Node  LNKE Current   IRQ  0x0 - 0 reference s  Possible  IRQ  0x0 - 0 reference s  Preferred IRQ  0xffffffff - ResourceOverride  IO_List  0000000000000000 Link Node  LNKF Current   IRQ  0x0 - 0 reference s  Possible  IRQ  0x0 - 0 reference s  Preferred IRQ  0xffffffff - ResourceOverride  IO_List  0000000000000000 Link Node  LNKG Current   IRQ  0x0 - 0 reference s  Possible  IRQ  0x0 - 0 reference s  Preferred IRQ  0xffffffff - ResourceOverride  IO_List  0000000000000000 Link Node  LNKH Current   IRQ  0x0 - 0 reference s  Possible  IRQ  0x0 - 0 reference s  Preferred IRQ  0xffffffff - ResourceOverride  IO_List  0000000000000000 In conclusion, arbitration is complicated and we keep adjusting it Windows 7 actually added a little bit of knowledge about VT-d to interrupt arbitration so that we could easily go beyond 64 cores People have been asking us for years to document the interfaces so that non-Microsoft-employed driver writers could write their own arbiters This would be most useful for  converged NICs  where a single PCI function exposes a bus driver which in turn exposes a NIC, an RDMA device, an iSCSI initiator and or an FCoE HBA These bus drivers jump through many hoops to do second-level interrupt dispatch for their children, which they wouldn t have to do if they could write an interrupt arbiter It s particularly difficult, though, to do interrupt arbitration in a distributed manner I O port or memory arbitration can be done locally on the bus related to the device But interrupts are often run as side-band signals straight from one part of the motherboard to another It s difficult to prove that you can make this code work if it s decentralized We wrote a simple bus driver that claims resources and doles them out for children It s called  MFsys  and it works so long as the resources you need for one child are completely disjoint from the resources you need for another child This tends not to be the case with converged NICs Some register or some interrupt gets used for some shared purpose For now though, the best answer I can give is that all this information is mostly useful for debugging - Jake Oshins  IMAGE  </description><link>http://www.secuobs.com/revue/news/219821.shtml</link><guid isPermaLink="false">http://www.secuobs.com/revue/news/219821.shtml</guid></item>
<item><title>Translation and Windows</title><description>Secuobs.com : 2010-05-07 03:22:22 - A Hole In My Head -    Arbitration and Translation, Part 2 Building on yesterday s post, I m going to try to explain how Windows copes with machines with strange resource translations I ll use two examples in this post, one related to I O port resources and one related to interrupts Just for convenience, I ll duplicate the diagram from my last post, which diagramed the address space translations in a fairly complex multi-PCI-root machine  IMAGE  Into such a machine, imagine that there s a NIC plugged into the secondary root PCI bus and an UART plugged into the ISA LPC bus, probably soldered onto the motherboard The resulting PnP tree would look like this   IMAGE  Of course, a fully populated PnP tree would be much more complicated If you want to see the real thing, in full, look in Device Manager and choose  Show Devices by Connection   I took flack a few years ago for admitting that internally, we called this  Show as God Intended  I still think of it that way, even though I understand why no user could use it that way  Alternatively, you can see the same thing in the kernel debugger by typing  devnode 0 1  For this example, assume the following things are true    The UART is not an ISA PnP device It s enumerated by the ACPI BIOS   The ACPI BIOS claims  through the _PRS object under the UART  that the device requires eight consecutive I O ports, at one of several locations   The ACPI BIOS claims that the device can use one of two IRQs, 2 or 5   The ACPI BIOS contains a  control method   labeled _SRS  which allows the ACPI driver to set the resources of the device   This device lies under the PCI root bus which is  Bus 0  in the example above It has a native I O port address space These things will cause the ACPI driver to respond to IRP_MN_QUERY_RESOURCE_REQUIREMENTS for this device with a structure that means  this device should be assigned one of three I O port blocks which is eight bytes long and it needs one IRQ, which can be either 2 or 5, not shareable, edge triggered  For a full description on how this statement is constructed, see the documentation on IO_RESOURCE_REQUIREMENTS_LIST in the WDK In short, I O Resource Requirements lists are the  set of all possible sets of resources that a device could use  For more detail on ACPI, see the spec As for the NIC, assume the following    It is a PCI device, not PCI-X or PCI Express The upstream bridge is a PCIe to PCI-X bridge, which allows PCI devices to be plugged in   It has one PCI Base Address register and that BAR is of type  I O,  implying that it must use the I O address space That BAR also implies that the registers of the NIC lie in a block that is 0x100 bytes long   It has a  1  in its Interrupt Pin register, implying that it will trigger its INTA signal with level-triggered semantics   This device lies under the PCI root  Bus 1  above It has its I O port space mapped into memory space These things will cause the PCI driver to respond to IRP_MN_QUERY_RESOURCE_REQUIREMENTS with  this device should be assigned one block of I O ports which is naturally aligned and 0x100 bytes long It can use any single IRQ, shareable and level-triggered  Upon receiving the response to these IRPs, the PnP manager starts trying to satisfy the requirements To do this, it works its way toward the root of the PnP tree looking first for bus drivers which expose an  arbiter interface  for each device type It also queries for a  translator interface  I ll cover arbiters in my next post Today s is really only about translators But they re somewhat intertwined, so I ll define arbiters today as  something which knows about a specific resource type and knows the bus-local rules for deciding how these resources are allocated  Allocating I O ports on a PCI bus is different from allocating them on an ISA bus Once the PnP manager has searched to the root of the PnP tree, it will have found some interfaces  IMAGE  The exact details have changed a little bit over the years and from release to release I believe that I ve accurately represented the state of affairs since Vista Incidentally, you can see these in the debugger by typing  translator  and  arbiter  Translating from ISA to Interrupt Controller Input Pins Since the ISA LPC bridge devnode responded with an interrupt translator interface, the PnP manager needs to translate interrupts from ISA to the parent PCI To really understand what this means, we need to have a little history lesson Thirtyish years ago, somebody at IBM decided that they were going to build a  personal computer  which had a single interrupt controller chip called the  8259 Programmable Interrupt Controller  PIC   It had eight inputs Each of these inputs was exposed in every expansion slot The output pins were directly connected to the processor A few years later, some other guy at IBM designed the  IBM PC AT  When they built the AT, they used an 80286 processor which had a sixteen-bit expansion bus They also added a few I O devices Since the expansion bus was wider, and since they needed more interrupt controller inputs now, they added a second 8259 to the machine This second one was chained onto the first one Its output pin was connected to IRQ 2 on the first one Interestingly IRQ2 was still exposed in the older part of the expansion bus, so they connected that signal to Input 1 on the second PIC So any old eight-bit device which was triggering the IRQ2 pin on the bus was actually going to cause IRQ9 to interrupt the processor Fast forward twenty-six or -seven years We still have code to comprehend this, and it s called a  translator interface for interrupts on the ISA devnode  The PnP manager invokes the translator from the ISA devnode and hands it two IO_RESOURCE_REQUIREMENTS, one saying  IRQ 2  and one saying  IRQ 5,  both edge-triggered and non-shareable The ISA devnode modifies the first one to say IRQ 9 It leaves everything else alone The PnP manager keeps looking toward the root of the tree The PCI driver really knows very little about interrupts  This is because the PCI spec is nearly silent on the topic Don t get me started on how many years I ve spent on filling that gap  So the PCI driver doesn t provide translator or arbiter interfaces for interrupts The ACPI driver, on the other hand, knows quite a bit about interrupts, as the ACPI spec has quite a bit of text allowing BIOSes to describe the ways that the motherboard designer handled interrupts in a specific machine So the ACPI driver exposes both interfaces The PnP manager, at this point, can stop translating interrupts from both devices because it has reached a common parent in the PnP with exposes an arbiter for interrupts The arbiter is then invoked to choose which resources each device will be assigned  Again, more on that in my next post  Translating from I O Ports   Step 1 For both devices, the PnP manager starts looking for translators and arbiters for the device s I O port claims It finds arbiters at the PCI layer, as PCI knows how to sub-allocate I O port space to its children Those rules are, thankfully, laid out quite clearly in the PCI spec, and aside from a few chipsets where the chipset designer didn t think that the PCI spec applied to him, we can successfully figure out what configuration will work at that level Note that no translation has happened yet We re still talking about I O ports as viewed on the buses which contain the devices, where the bus cycles will definitely be tagged as  I O  Translation after Arbitration Assume that for this example, the arbiters picked this set of choices  UART  IRQ 9 and I O ports 0x2040 through 0x2047 NIC  IRQ 11 and I O ports 0x2000 through 0x20FF No, that s not a typo Their I O port claims actually seem like they overlap This is fine, as they re disjoint address spaces on different buses  This can t really happen on most PCs, but it can and does happen on some machines See my last post  Now that the PnP manager has a resource assignment, it has to figure out how to present that choice to two separate audiences with two very different sets of needs The first audience is the bus drivers Now that we ve chosen a resource set for each device, we need to program the devices so that they actually embody those choices For the PCI device, this involves writing 0x2000 to its I O BAR For the LPC-attached UART, this involves executing the _SRS control method in the ACPI namespace underneath the UART device Both of them need to be in bus-relative terms The second audience is the functional drivers, for the NIC and the UART They don t need to see the bus-relative view, as the driver can t really directly generate bus traffic The FDOs are made up of driver code running on the processor, so they need the processor-relative view of those resource claims To achieve that, I need to show you something we internally call the  checkmark diagram  To truly understand this diagram, I have to apologize for the fact that, in house, all the PnP trees are drawn on whiteboards with the  root  at the top and the devices are leaves down at the bottom This corresponds nicely with diagrams of physical machines where the processors and memory are at the top and the I O devices hang down below like little appendages The DDK WDK tech writers convinced us that all public documentation should have the  root  of a  tree  firmly planted in the  ground  Oh well  IMAGE  I ve already described steps 1 through 3 After arbitration, though, the PnP manager has to put these claims back in terms of the I O bus The only resource that went through translation on the way to arbitration was the IRQ for the UART So now the translator interface from the ISA devnode reverses that process and changes that 9 back into a 2 So the resulting  raw resource  assignments are now in bus-relative terms They re also now in terms of CM Resource Lists Those are documented in the WDK, too Again, in short, a CM Resource List is a single complete set of resources that a device either is using or could be using The raw resource lists for the devices are  UART  IRQ 2 and I O Ports 0x2040 through 0x2047 NIC  IRQ 11 and I O Ports 0x2000 through 0x20ff Lastly, the PnP manager goes back to toward the root of the PnP tree, passing the various resource assignments to any translators that may be at each node of the tree, trying to build a different CM Resource List, this time in terms of the processor The ISA devnode s Interrupt translator immediately reverses itself again, and changes that 2 back into a 9 But there s another interrupt translator in the tree, too, at the ACPI level That translator is actually privy to some internal choices that the interrupt arbiter made, involving the IRQL and IDT entries  and in Windows 7 and later, IOMMU Interrupt Redirection Table entries  that the arbiter chose So that translator can translate into processor-relative terms For the root PCI bus which maps its I O Port space into processor memory, ACPI supplies an I O Port translator interface  It knows to do this based on contents of the ACPI namespace  Thus the  translated resource lists  for these end up looking like this  UART  IRQL 11, Vector 0xb3, Affinity  target processor set  0xF0 and I O Ports 0x2040 through 0x2047 NIC  IRQL 10, Vector 0xa9, Affinity 0x0F and memory range 0x1 00002000 through 0x1 000020FF Presenting Resources to Drivers When all of this is complete, there are two CM Resource Lists in the PnP manager for the device Both get sent as part of IRP_MN_START_DEVICE As explained in my last post, the driver contract is that the bus driver  or a bus filter like ACPI, sometimes  programs the device using the raw resources The function driver calls MmMapIoSpace, IoConnectInterrupt, etc, using only the translated resources My next post will go into detail on what arbiters do - Jake Oshins  IMAGE  </description><link>http://www.secuobs.com/revue/news/219722.shtml</link><guid isPermaLink="false">http://www.secuobs.com/revue/news/219722.shtml</guid></item>
<item><title>Arbitration and Translation, Part 1</title><description>Secuobs.com : 2010-05-06 03:43:41 - A Hole In My Head -    A while back Jake Oshins answered a question on NTDEV about bus arbitration and afterwards I asked him if he could write a couple of posts about it for the blog Here is part 1 History Lesson In the history of computing, most machines weren t PCs PCs, and the related  industry standard  server platforms, may constitute a huge portion of the computers that have been sold in the last couple of decades, but even during that time, there have been countless machines, both big and small, which weren t PCs Windows, at least those variants which are derived from Windows NT,  which include Windows XP and everything since,  was originally targeted at non-PC machines, specifically those with a MIPS processor and a custom motherboard which was designed by in-house at Microsoft In the fifteen years that followed that machine, NT ran on a whole pile of other machines, many with different processor architectures My own career path involved working on the port of Windows NT to PowerPC machines I wrote HALs and worked on device drivers for several RS 6000 workstations and servers which  briefly  ran NT When I came to Microsoft from IBM, the NT team was just getting into the meat of the PnP problem The Windows 95 team had already done quite a bit to understand PnP, but their problem space was really strongly constrained Win95 only ran on PCs, and only those with a single processor and a single root PCI bus Very quickly, I got sucked into the discussion about how to apply PnP concepts to machines which were not PCs, and also how to extend the driver model in ways that would continue to make it possible to have one driver which ran on any machine, PC or not If the processor target wasn t x86, you d need to recompile it But the code itself wouldn t need changing If the processor target was x86, even if the machine wasn t strictly a PC, your driver would just run In order to talk about non-PC bus architectures, I want to briefly cover PC buses, for contrast PC s have two address spaces, I O and memory You use different instructions to access each I O uses  IN, OUT, INS, and OUTS  That s it Memory uses just about any other instruction, at least any that can involve a pointer I O has no way of indirecting it, like virtual memory indirects memory That s all I ll say about those here If you want more detail, there have been hundreds of good explanations for this My favorite comes from Mindshare s ISA System Architecture, although that s partly because that one existed back when I didn t fully understand the problem space Perhaps there are better ones now In the early PC days, the processor bus and the I O bus weren t really separate There were distinctions, but those weren t strongly delineated until PCI came along, in the early  90s PCI was successful and enduring because, in no small part, it was defined entirely without reference to a specific processor or processor architecture The PCI Spec has almost completely avoided talking about anything that happens outside of the PCI bus This means, however, that any specific implementation has to have something which bridges the PCI spec to the processor bus  I m saying  processor bus  loosely here to mean any system of interconnecting processors, memory and the non-cache-coherent I O domains This sometimes gets referred to as a  North Bridge,  too  The processor bus then gets mapped onto the I O subsystem, specifically one or more root PCI buses The following diagram shows a machine that has two root PCI buses  which is not at all typical this year, but was very typical of PC servers a decade ago  The specific addresses could change from motherboard to motherboard and were reported to the OS by the BIOS multi root PCI You ll notice that processor I O space is pretty limited It s even more limited when you look at the PCI to PCI bridge specification, which says that down-stream PCI busses must allocate chucks of I O address space on 4K boundaries This means that there are only a few possible  slots  to allocate from and a relatively small number of PCI busses can allocate I O address space at all Attempts to expand I O Space Today, this lack of I O space problem is mostly handled by creating devices which only use memory space  or memory-mapped I O space as it s sometimes called  But in the past, and in some current very-high-end machines, multiple PCI I O spaces are mapped into a machine by mapping them into processor memory space rather than processor I O space I ve debugged many a machine that had a memory map like the following  IMAGE  In this machine, you need to use memory instructions, complete with virtual address mappings, if you want to manipulate the registers of your device, as long as that device is on Root PCI Bus 1 or one of its children If your device is plugged into Root PCI Bus 0, then you use I O instructions While that s a little bit hard to code for  more on that later  it s nice because each PCI bus has its full 16K of I O address space In theory, the secondary root PCI buses can have even more than 16K of space The PCI spec allows for 32-bits of I O space and devices are required to decode 32-bit addresses of I O Since it s all just mapped into processor memory space, which is large, you can have a really large I O space In practice, though, many devices didn t follow the spec and the one machine I ve seen that depended on this capability had a very, very short list of compatible adapters Non-Intel Processors If you ve ever written code for a processor that Intel didn t have a hand in designing, you ve probably noticed that the concept of I O address spaces is pretty rare elsewhere  Now please don t write to me telling me about some machine that you worked on early in your career I ve heard those stories I ll even bore you with my own as penance for sending me yours  Let s just stop the discussing by pointing out that MIPS, Alpha and PowerPC never had any notion of I O address space and Itanic has an I O space, but only if you look at it from certain angles And those are the set of non-x86 processors that NT has historically run on Chipset designers who deal with non-PC processors and non-PC chipsets often do something really similar to what was just described above where the north bridge translates I O to memory, except that not even PCI Bus 0 has any native I O space mapping All the root PCI buses map their I O spaces into processor memory space Windows NT Driver Contract About now, you re probably itching to challenge my statement  above  where I said you could write a driver which runs just fine regardless of which sort of processor address space your device shows up in Interestingly, I ve been working on HALs and drivers within Microsoft  and at IBM before that  for about 16 years now and I always knew that I understood the contract I also knew that few drivers not shipped with NT followed the contract What I didn t know was that, even though the  rules  are more or less described in the old DDK docs, very few people outside of Microsoft had internalized those rules, and in fact one major driver consulting and teaching outfit  who shall remain nameless, but who s initials are  OSR  was actually teaching a different contract After much discussion about this a few years ago, and from my own experience, I believe that it was essentially an unreasonable contract, in that it was untestable if you didn t own a big-iron machine with weird translations or a non-PC machine running a minority processor I ll lay out the contract here, though, for the sake of completeness 1 There are  raw  resources and  translated  resources Raw resources are in terms of the I O bus which contains the device Translated resources are in terms of the processor Every resource claim has both forms 2 Bus drivers take raw resources and program the bus, the device or both so that the device registers show up at that set of addresses 3 Function drivers take the translated resources and use them in the driver code, as the code runs on the processor Function drivers must ignore the raw resource list Even if the function driver was written by a guy who is absolutely certain that his device appears in I O space, because it is a PCI device with one Base Address Register of type I O, the driver must still look at the resource type in the translated resources 4 If your device registers are in I O space from the point of view of the processor, your translated resources will be presented as CmResourceTypePort If your translated resources are of this type, you must use  port  functions to access your device These functions have names that start with READ_PORT_ and WRITE_PORT_ 5 If your device registers are in memory space from the point of view of the processor, your translated resources will be presented as CmResourceTypeMemory If they are of this type, you must first call MmMapIoSpace to get a virtual address for that physical address Then you use  memory  functions, with names that start with READ_REGISTER_ and WRITE_REGISTER_ When your device gets stopped, you call MmUnmapIoSpace to release the virtual address space that you allocated above This contract works  No, really, I m certain I ve written a lot of code that uses it  But it s not an easy contract to code to, and I ll lay out the issues    The  PORT  functions and the  REGISTER  functions are not truly symmetric The forms that take a string and transfer it do different things The PORT functions assume the register is a FIFO The REGISTER functions assume it s a region of memory space that s being referred to So you pretty much have to ignore the string forms of these and code your own with a loop   All access to your device either has an  if port then, else memory  structure to it Or you create a function table that access the device, with variant port memory forms   The ever-so-popular driver structure where you define your registers in a C-style struct and then call MmMapIoSpace and lay your struct over top of your device memory just doesn t work in any machine that translates device memory to processor I O  Yes, I ve even seen one of those  In the end, most driver writers outside of the NT team either ignore the contract because they are unaware of it, or ignore it because they have no way to test their driver in non-PC machines Imagine telling your boss that you have functions which deal with I O mapped into processor memory in your driver but you ve never seen them run So he can either ship untested code or pony up and buy you an HP Superdome Itanic, fully populated with 256 processors just to test on  IMAGE  </description><link>http://www.secuobs.com/revue/news/219309.shtml</link><guid isPermaLink="false">http://www.secuobs.com/revue/news/219309.shtml</guid></item>
<item><title>WDK v71 is now available</title><description>Secuobs.com : 2010-03-01 13:00:43 - A Hole In My Head -    A refresh of the WDK is now available on connect You can download the v71 WDK following the directions on WHDC The change list for the WDK can be found here, I copying it here as well  WDK Version 710 Changes and Issues This section contains information about the changes to the WDK for the 710 refresh release Windows XP x64 is now supported as an installation platform Debugger Changes The Debugging Tools for Windows have been updated in this release of the WDK The following changes were made   Debugger Version changed to 6122633 See the release notes in the debugger package for more information  Updated ndiskddll  Miscellaneous bugfixes in UMDF and KMDF debugger extensions  KMDF   Ability to print more than 50 requests on a queue  wdfdevice correctly displays FileObjectClass Name  Display if a request is cancelled if it is on the driver notified list UMDF  Correctly display a umirps cancel callback  Dump a devices Cleanup and Close callbacks  Display the file object associated with a request Redistributable Change  Update to offregdll Build Environment Changes  Updated MSVCRTlib to fix driver crashes in Vista  Updated ws2_32lib  Added ntddumph  Added NPIVmof  Added headers for Vista 7ip  Fixed annotations on I O routines  Added wudftracectl containing all umdf trace guids Sample Changes and Issues NDIS   Xframeii  bugs fixed  Added NetVmini sample  Sensor skeleton sample  memory leak issues fixed  KMDF Toaster sample  bug fixed  WDM Event sample  bug fixed  Port I O sample driver  There is a syntax issue in the file  src general portio sys genportinx that prevents the driver from being successfully installed on pre-Windows 7 systems The workaround is to replace all occurrences of  PORTIO_Device  with  PortIO_Inst  Update to Sensor Adapter Test Suite  IMAGE  </description><link>http://www.secuobs.com/revue/news/196553.shtml</link><guid isPermaLink="false">http://www.secuobs.com/revue/news/196553.shtml</guid></item>
<item><title>What is IRQL </title><description>Secuobs.com : 2010-02-03 03:26:45 - A Hole In My Head -    Jake Oshins wanted to write about IRQLs and I am gladly letting him use my blog as a platform Here it is  I ve found myself explaining IRQL a lot lately, sometimes to people who want to know because they re trying to write Windows drivers and sometimes to people who are accustomed to Linux or some other variant of Unix and they want to know why something like IRQL is required within Windows when those systems so clearly get by without it Penny Orwick covered this topic before, in the following two papers, with a lot of help from me and some others  http wwwmicrosoftcom whdc driver kernel irqlmspx http wwwmicrosoftcom whdc driver kernel locksmspx I ll try to do it a little more briefly here Computers have many things within them that can interrupt a processor These include timers, I O devices, other processors, internal processor performance counters, etc All processors have an instruction for disabling interrupts, somehow, but that instruction  cli in x64 processors  isn t selective about which interrupts it disables The people who built DEC s VMS operating system also helped design the processors that DEC used, and many of them came to Microsoft and designed Windows NT, which was the basis for modern versions of Windows, including Windows XP and Windows 7 These guys wanted a way to disable  very quickly  just some of the interrupts in the system They considered it useful to hold off interrupts from some sources while servicing interrupts from other sources They also realized that, just as you must acquire locks in the same order everywhere in your code to avoid deadlocks, you must also service interrupts with the same relative priority every time It doesn t work if the clock interrupts are sometimes more important than the IDE controller s interrupts and sometimes they aren t Interrupts are frequently called  Interrupt ReQuests  and the priority of a specific IRQ is its Level These letters, all run together, are IRQL So if you lay out all the interrupt sources in the system and create a priority for each one, or sometimes a priority for each group, you can start to do interesting things Consider a spinlock Spinlocks  at least in the traditional sense  are implemented by having a processor spin in a tight loop trying to atomically modify a variable The cache coherency hardware guarantees that only one processor can do that at a time, so lock acquisition goes only to the processor that succeeds Other processors keep spinning until they succeed The processor that  owns  the lock needs to release the lock as soon as possible, as the other  waiting  processors are burning up processor time waiting to acquire the lock So you really don t want to interrupt that processor and schedule some other thread for execution, causing all the waiters to spin until the owning thread is rescheduled In this situation, some operating systems encourage the owner of the spinlock to disable all interrupts so that the code can t be interrupted  Note, too, that interrupts really need to be disabled before trying to acquire the lock, or the thread might be interrupted between acquiring the lock and disabling interrupts  The designers of VMS and NT decided that they didn t want to disable all interrupts just because some code somewhere acquired a spinlock Some things shouldn t wait TLB flushes, are a good example So if only some interrupts are disabled while a spinlock is held, then you can still briefly interrupt the code that owns the lock for much more important tasks Perhaps even more importantly, you can interrupt the processors which are spinning, waiting to acquire a spinlock for these important tasks, causing them to do something useful instead of just spinning Note that this means that every spinlock has an associated IRQL, and you have to use that IRQL consistently, or the machine will deadlock In NT, by default, every spinlock has the same IRQL, called DISPATCH_LEVEL DISPATCH_LEVEL means, essentially, that the interrupts which can cause a thread to stop running are disabled  More about that later  Here s a table of all IRQLs, as defined in the Windows NT header files  easily seen in the WDK  IRQL X86 IRQL Value AMD64 IRQL Value IA64 IRQL Value Description PASSIVE_LEVEL 0 0 0 User threads and most kernel-mode operations APC_LEVEL 1 1 1 Asynchronous procedure calls and page faults DISPATCH_LEVEL 2 2 2 Thread scheduler and deferred procedure calls  DPCs  CMC_LEVEL N A N A 3 Correctable machine-check level  IA64 platforms only  Device interrupt levels  DIRQL  3-26 3-11 4-11 Device interrupts PC_LEVEL N A N A 12 Performance counter  IA64 platforms only  PROFILE_LEVEL 27 15 15 Profiling timer for releases earlier than Windows 2000 SYNCH_LEVEL 27 13 13 Synchronization of code and instruction streams across processors CLOCK_LEVEL N A 13 13 Clock timer CLOCK2_LEVEL 28 N A N A Clock timer for x86 hardware IPI_LEVEL 29 14 14 Interprocessor interrupt for enforcing cache consistency POWER_LEVEL 30 14 15 Power failure HIGH_LEVEL 31 15 15 Machine checks and catastrophic errors  profiling timer for Windows XP and later releases For driver writers, the only IRQLs that are usually interesting are 0 through 2 and DIRQL It s worth mentioning, though, that the NT kernel itself internally has spinlocks at DISPATCH_LEVEL and all the levels above that So, now for a tour of interesting IRQLs  PASSIVE_LEVEL This is the level at which threads run In fact, if you look at the specific definition of  thread  in NT, it pretty much only covers code that runs in the context of a specific process, at PASSIVE_LEVEL or APC_LEVEL Deferred Procedure Calls  DPCs  are not threads, in that sense Any interrupt can occur at PASSIVE_LEVEL User-mode code executes at PASSIVE_LEVEL APC_LEVEL Windows NT has an interesting mechanism for getting into a certain thread context You can queue an interrupt to a thread, so that your function will run on that thread s stack, with that thread s address space, with that thread s local storage This is useful for I O completion When I O completes, you queue an APC back to the requesting thread which does the last part of I O completion in the initiator s address space It s a neat way to solve a bunch of problems If you want to disable interrupts to your thread, you raise to APC_LEVEL At least that was the original design APCs and the rules around them have grown much more complicated over the years At this point, the best that you can say is that if you care to disable APCs, call KeEnterCriticalRegion  http msdnmicrosoftcom en-us library ms801955aspx  or KeEnterGuardedRegion  http msdnmicrosoftcom en-us library ms801643aspx  Your code generally won t need to run at APC_LEVEL at all, unless you use Fast Mutexes  http msdnmicrosoftcom en-us library aa490219aspx  Fast Mutexes are somewhat faster than Mutexes  http msdnmicrosoftcom en-us library aa490228aspx  or other dispatcher objects because, among other things, they hold off APCs by raising to APC_LEVEL APC interrupts, by the way, are sent by a processor, either to itself or to another processor No external device is involved DISPATCH_LEVEL Windows NT doesn t have a  scheduler  in the sense that most Unix variants do There is no process that decides which other processes should run Each processor  dispatches  itself by looking at runnable threads and deciding which one to run next This is a scheduler, of sorts, but not the same thing that many people coming from Linux will imagine The dispatcher is interrupt driven, in that it won t allow a thread to run longer than its quantum before scheduling another thread But the scheduling clock doesn t generate dispatcher interrupts directly The clock interrupt fires at CLOCK_LEVEL, somewhat more frequently than the thread scheduling quantum Various housekeeping tasks happen as a result of the clock interrupt, and one of them is that a dispatcher interrupt is generated by the processor to itself  Actually, this internal self-interrupt is often optimized away, but the architectural result is the same as if an interrupt were generated  If your code raises IRQL to DISPATCH_LEVEL, you have disabled the dispatcher on that processor, and only on that processor This means that your thread will not be pre-empted by another thread and it will not be moved to another processor until you lower IRQL Since, as noted above, I O completion depends on code running at APC_LEVEL, and since APC_LEVEL code won t run while the processor is at DISPATCH_LEVEL, page faults can t be resolved at DISPATCH_LEVEL So code that holds a DISPATCH_LEVEL lock  like a spinlock  can t reference memory which might be paged out Furthermore, most of the locking primitives that the NT kernel provides are what are called  dispatcher objects   http msdnmicrosoftcom en-us library aa490210aspx  You can wait on dispatcher objects until they are signaled and, while your code is waiting, the processor is free to get other work done, on behalf of other threads This is nice, because, in contrast with the spinlock, which consumes the processor doing no useful work while it s waiting, dispatcher objects allow the dispatcher to find other work until the reason for waiting can be satisfied What this means to you, though, is that you can t wait on a dispatcher object at DISPATCH_LEVEL You ve already disabled the dispatcher Your only choice at DISPATCH_LEVEL is a spinlock DIRQL  DIRQL  is the shorthand that many people  internal to Microsoft and external  use when they mean  the IRQL that the PnP manager assigned to my device s interrupt, and the associated interrupt spinlock and interrupt service routine  When a bus driver requests an interrupt for a device  as when the PCI driver finds the Interrupt Pin register set to some non-zero value, or when it discovers an MSI-X table  it tells the PnP manager two things First, it says that the device needs to register an ISR or a set of ISRs Next it says something about how the device is attached to any interrupt controllers present in the machine The PnP manager picks a processor to attach the interrupt to and picks the IRQL for that interrupt Sometimes that choice is constrained by the way the wires are laid out on the motherboard, sometimes not That topic is too big for this post  I might go into it later I wrote the code  As you can see from the table above, there is more than one DIRQL Unless your device generates more than one interrupt, you don t really have to care Just pass along the values that you were given Your interrupt spinlock s IRQL is that which was assigned to you The only thing you have to know about it is that acquiring that lock means that you ve pre-empted everything happening at lower IRQL You haven t pre-empted things like TLB updates, though, as those still come in at higher IRQL If your device does generate more than one interrupt, and if you need one spinlock that is used for both interrupt sources, you need to register your interrupt service routines with the highest of your DIRQLs as the SynchronizeIrql, which will avoid deadlocks by guaranteeing that all your interrupt-related code runs at the highest necessary IRQL In summary, IRQL is a concept that was intended to allow spinlocks to be sorted into more-important and less-important buckets, so that some interrupts can occur while other interrupts are disabled Most people agree that this is fairly complex to work with Whether you believe this was a necessary addition to the driver model is the source of a debate that s been raging on the  net since before Windows NT actually existed - Jake Oshins  IMAGE  </description><link>http://www.secuobs.com/revue/news/187970.shtml</link><guid isPermaLink="false">http://www.secuobs.com/revue/news/187970.shtml</guid></item>
<item><title>one of the books that started it all</title><description>Secuobs.com : 2009-06-02 14:41:57 - A Hole In My Head -    During my sophomore year at Cal Poly, I decided that I wanted to learnabout threads, synchronization techniques and other topics associatedmodern operating systems Windows 95 had made its debut yes, it isnot a modern OS, but I didn't know that at the time and I had heardabout Windows NT, but had never seen or used it Since I had Win95 onmy machine at home, I decided that I would go to the bookstore and buya book on threadingWell, needless to say there was not a book that was dedicated tothreading, but I found a couple of books on Windows programming thatincluded threading as topics It came down to 2 books, JeffreyRichter's "Advanced Windows for Win95 and NT" with Napoleon on thecover and some other book I obviously picked Richter's book ;,although now it is called "Windows via C/C++"This book was, and still is, awesome I have both the original and neweditions Great detail, very easy to read and met it my needsperfectly unlike "Inside OLE" by Kraig Brockschmidt which was full ofinformation, I just could not stay awake reading it :  IMHO, itsuccessfully started me down the road to becoming a well roundeddeveloper and I think it helped in getting my foot in the door atMicrosoft I have even had the pleasure of letting Jeff knowpersonally what a great book it is and how it shaped me for years tocomIMAGE</description><link>http://www.secuobs.com/revue/news/104820.shtml</link><guid isPermaLink="false">http://www.secuobs.com/revue/news/104820.shtml</guid></item>
<item><title>Returning failure from DriverEntry</title><description>Secuobs.com : 2009-06-02 14:41:57 - A Hole In My Head -    One thing that is easily overlooked about implementing DriverEntry isthat upon return NT_SUCCESS, DriverUnload is not called I mentionedthis anecdotally in a previous post, but it is worth expanding on Iwas bit by this oversight when I was working on the Bluetooth stackDriver verifier correctly identified that my driver had leaked poolThe code looked something like this// GlobalsUNICODE_STRING gRegistryPath = { 0 };NTSTATUS DriverEntryPDRIVER_OBJECT DriverObject, PUNICODE_STRING RegistryPath{NTSTATUS status;DriverObject-DriverUnload = DriverUnload;gRegistryPathLength = RegistryPath-Length;gRegistryPathMaximumLength = RegistryPath-MaximumLength;gRegistryPathBuffer = PWCHAR ExAllocatePoolWithTagPagedPool, gRegistryPathMaximumLength, tag;if gRegistryPathBuffer == NULL { return STATUS_INSUFFICIENT_RESOURCES; }RtlCopyMemorygRegistryPathBuffer, RegistryPath-Buffer, gRegistryPathLength;status = RegisterWithPortDriverDriverObject, ;if NT_SUCCESSstatus { return status; } == leak right here//  other init return status;}void DriverUnloadPDRIVER_OBJECT DriverObject{ExFreePoolgRegistryPathBuffer;RtlZeroMemoryetgRegistryPath, sizeofgRegistryPath;}While many WDM drivers do very little outside of initializing thedispatch table and other fields in their DriverObject, a miniportdriver or a KMDF driver must register with their port driver likeScsiPortInitialize or framework WdfDriverCreate and thisregistration can introduce failure in DriverEntry just like in mycode sample above What to doIn a WDM driver you have to be very careful and manage this manuallyEither you have a common error exit path out of DriverEntry whichperforms the cleanup or manually calls your DriverUnload routine orcleanup on each possible point of error This pattern is very easy toget wrong and is not very maintainable, it is quite easy to add a newallocation and forget to cleanup it up laterIn a KMDF driver things are a bit easier to manage if you follow aparticular pattern While EvtDriverUnload has the same problems as theWDM DriverUnload, the EvtObjectCleanup routine registered on theWDFDRIVER is called in both scenarios To re-emphasize, theEvtObjectCleanup registered on WDFDRIVER will be called when eitherDriverEntry returns NT_SUCCESS or if your driver is gracefullyunloaded later This means that if you put all of your cleanup in thecleanup routine your DriverEntry implemention becomes much simplerThe one caveat is that the call to WdfDriverCreate must come beforeany allocations in your driver or state chaning APIs WPP_INIT_TRACINGis one such state changing API where you must undo its effects bycalling WPP_CLEANUP Quite a few WDK samples show this patternalthough suprisingly to me, not all, let us look at the nonpnpsample %wdk%srckmdfonpnpsysonpnpcNTSTATUSDriverEntryIN OUT PDRIVER_OBJECT   DriverObject,IN PUNICODE_STRING      RegistryPath{NTSTATUS                       status;WDF_DRIVER_CONFIG              config;WDFDRIVER                      hDriver;PWDFDEVICE_INIT                pInit = NULL;WDF_OBJECT_ATTRIBUTES          attributes;WDF_DRIVER_CONFIG_INITetconfig, WDF_NO_EVENT_CALLBACK;// Tell the framework that this is non-pnp driver so that it doesn't set the default AddDevice routineconfigDriverInitFlags |= WdfDriverInitNonPnpDriver;// NonPnp driver must explicitly register an unload routine for the driver to be unloadedconfigEvtDriverUnload = NonPnpEvtDriverUnload;// Register a cleanup callback so that we can call WPP_CLEANUP when// the framework driver object is deleted during driver unload       WDF_OBJECT_ATTRIBUTES_INITattributesEvtCleanupCallback = NonPnpEvtDriverContextCleanup;status = WdfDriverCreateDriverObject,RegistryPath,etattributes,etconfig,if NT_SUCCESSstatus {KdPrint "NonPnp: WdfDriverCreate failed with status 0x%x", status;return status;}// Since we are calling WPP_CLEANUP in the DriverContextCleanup// callback we should initialize WPP Tracing after WDFDRIVER// object is created to ensure that we cleanup WPP properly// if we return failure status from DriverEntry This// eliminates the need to call WPP_CLEANUP in every path// of DriverEntry       WPP_INIT_TRACING DriverObject, RegistryPath ;return status;}VOID NonPnpEvtDriverContextCleanupWDFDRIVER Driver{WPP_CLEANUPWdfDriverWdmGetDriverObjectDriver;}The red comments show what is going on Hopefully the code is selfexplanatoryIncidentally, another pattern you can use for global memoryallocations is to allocate the memory with WdfMemoryCreate withoutspecifying a parent object The WDFDRIVER will be the parent object bydefault and since all child objects are destroyed when the parent isdestroyed, all of your allocations will be destroyed afterEvtDriverUnload has been called when the WDFDRIVER is destroyed in theunload pathIMAGE</description><link>http://www.secuobs.com/revue/news/104819.shtml</link><guid isPermaLink="false">http://www.secuobs.com/revue/news/104819.shtml</guid></item>
<item><title>Once not disableable, forever not disableable</title><description>Secuobs.com : 2009-06-02 14:41:57 - A Hole In My Head -    One interesting quirk about the PNP_DEVICE_NOT_DISABLEABLE state isthat once it has been set and the PnP manager has processed it, thestate is sticky By sticky I mean that even if you attempt to clearthis bit on a subsequent IRP_MN_QUERY_PNP_DEVICE_STATE IRP, the PnPmanager ignores your changes to this state This state remains stuckuntil any of the following occur1 The machine is rebooted and the device is reenumerated2 The device or any device in its ancestry is surprise removed3 The device or any device in its ancestry is ejectedIMAGE</description><link>http://www.secuobs.com/revue/news/104818.shtml</link><guid isPermaLink="false">http://www.secuobs.com/revue/news/104818.shtml</guid></item>
<item><title>Inconceivableable</title><description>Secuobs.com : 2009-06-02 14:41:57 - A Hole In My Head -    I have no idea who created the name for PNP_DEVICE_NOT_DISABLEABLE,but I probably have the same reaction as you  "seriously that iswhat they named" I mean come on, I think it could have at least beennamed PNP_DEVICE_CANNOT_BE_DISABLED I am sure you can think of somebetter names too If so, please leave a comment with your suggestionsWhile we had a chance to rectify this in KMDF in the WDF_DEVICE_STATEstructure, we chose to keep the field name NotDisableable similar tothe WDM name to avoid confusionFor any readers who have not encountered this bit, it is a part ofPNP_DEVICE_STATE You set this bit in the IRP_MJ_PNP/IRP_MN_QUERY_PNP_DEVICE_STATEIRP after calling IoInvalidateDeviceStateIMAGE</description><link>http://www.secuobs.com/revue/news/104817.shtml</link><guid isPermaLink="false">http://www.secuobs.com/revue/news/104817.shtml</guid></item>
<item><title>Using KeAcquireSpinLockAtDpcLevel is only a perf gain if you know you are DISPATCH_LEVEL</title><description>Secuobs.com : 2009-06-02 14:41:57 - A Hole In My Head -    Well, that is certainly a long title ;First, let us look at an approximate implementation ofKeAcquireSpinLock and KeRaiseIrql and yes I know that KeRaiseIrql isreally a #define to KfRaiseIrql, but it is the same thing that happensin the end…KIRQL KeAcquireSpinLockPKSPIN_LOCK SpinLock, PKIRQL PreviousIrql{KeRaiseIrqlDISPATCH_LEVEL, PreviousIrql;spin on the lock until it has been acquired   }VOID KeRaiseIrqlKIRQL NewIrql, PKIRQL, PKIRQL OldIrql{OldIrql = KeGetCurrentIrql;       raise IRQL to NewIrql   }What I want to emphasize is that KeAcquireSpinLock will retrieve thecurrent IRQL to know what to restore the IRQL to when the lock isreleased as a part of acquiring the spin lock Retrieving the currentirql is a relatively expensive operation EnterKeAcquireSpinLockAtDpcLevel KeAcquireSpinLockAtDpcLevel does awaywith the IRQL change and just implements spin on the lock until ithas been acquired, but it does this with a large caveat…you must berunning at DISPATCH_LEVEL in reality it requires IRQL =DISPATCH_LEVEL, but that is another discussion for another day  Itrequires DISPATCH_LEVEL or higher so that you do not deadlock Anothercaveat to effectively use KeAcquireSpinLockAtLevel you must know 100%that you are at DISPATCH_LEVEL Naively, one could think that thefollowing code optimizes for both casesif KeGetCurrentIrql  DISPATCH_LEVEL {KeAcquireSpinLocketlock, }else {KeAcquireSpinLockAtDpcLevel}But the problem here is that in the case of the current IRQL DISPATCH_LEVEL, the current IRQL is being retrieved twice once inyour code, once in KeAcquireSpinLock, so this relatively expensiveoperation is being performed twice when it should be performed onlyonce What all of this boils down to is that you must know with 100%certainty that the current IRQL is DISPATCH_LEVEL before you can useKeAcquireSpinLockAtDpcLevel effectively Here are a couple of contextshere you can know for certain that the IRQL is DISPATCH_LEVEL* In a DPC for ISR, for a timer, your own* After you have acquired a spin lock In the case where you havenested locked being acquired first A, then B, after you havecalled KeAcquireSpinLocketA you can then callKeAcquireSpinLockAtDpcLeveletBNotice that a completion routine is not guaranteed to be called atIRQL == DISPATCH_LEVEL While you may see that it is called atdispatch, it is not something that you can rely on 100% of the timeFor instance, the lower driver could complete the IRP at passive levelin an error conditionIMAGE</description><link>http://www.secuobs.com/revue/news/104816.shtml</link><guid isPermaLink="false">http://www.secuobs.com/revue/news/104816.shtml</guid></item>
<item><title>EvtDevicePreprocessWdmIrp is not entirely free</title><description>Secuobs.com : 2009-06-02 14:41:57 - A Hole In My Head -    One of the WDM escapes in KMDF is EvtDeviceWdmIrpPreprocess orEvtDevicePreprocessWdmIrp in the API in which you register it whichyou can register for by callingWdfDeviceInitAssignWdmIrpPreprocessCallback This function allows youto process a WDM PIRP before KMDF sees it and potentially processesit From a KMDF adoption point of view, this functionaltiy was a verystrong requirement Without, there would be not be a defined way for aKMDF to support IRPs that KMDF did not natively support For instance,the KMDF serial example registers a preprocess routine forIRP_MJ_FLUSH_BUFFERS, from ddkrootsrckmdfserialpnpc://// Since framework queues doesn't handle IRP_MJ_FLUSH_BUFFERS,// IRP_MJ_QUERY_INFORMATION and IRP_MJ_SET_INFORMATION requests,// we will register a preprocess callback to handle them//status = WdfDeviceInitAssignWdmIrpPreprocessCallbackDeviceInit,SerialFlushIRP_MJ_FLUSH_BUFFERS,NULL, // pointer minor function table0; // number of entries in the tableYou should use this functionality in your driver only if this is thelast resort you have; it does not come for free When you register forany preprocess routine, KMDF will increase the StackSize in theunderlying WDM PDEVICE_OBJECT This is called out in the documentionfor "Preprocessing and Postprocessing IRPs" in KMDF By increasing thestack size, all PIRPs that are sent to your device will have an extraIO stack location This means that the PIRP is a bit larger than itwould otherwise be; larger not only for the IRP_MJ code that youregistered for, but for each and every PIRP that is sent to yourdevice regardless of IRP_MJ code Additionally, if your StackSize goesover an internal threshhold in the IO manager, the way the PIRP isallocated could be different for instance, devices with a StackSizeof  N might always be allocated by calling ExAllocatePool eachtime Mind you, the internal threshhold is a bit high so if you havea root enumerated device or your initial StackSize is low, this is notsomething that you will seeSo while registering a preprocess routine is a very useful thing to beable to do, you should consider your options carefully and onlyregister a preprocess routine if you have no other optionIMAGE</description><link>http://www.secuobs.com/revue/news/104815.shtml</link><guid isPermaLink="false">http://www.secuobs.com/revue/news/104815.shtml</guid></item>
<item><title>Debugger commands step_filter that make my life easier</title><description>Secuobs.com : 2009-06-02 14:41:57 - A Hole In My Head -    This is a pretty cool and somewhat obscure debugger command It allowsyou to tell the debugger what functions to skip if you are using thetrace command 't' I think of the trace command as the 'step into'command though, but that is just me Let's say we have the followingsimple application:#include struct Foo {Foo : m_value0 { }int Increment { return ++m_value; }static void Printint i { printf"%d", i; }int m_value;};int _cdecl mainint argc, char *argv{Foo f;Foo::PrintfIncrement;return 0;}If I were to run the program under the debugger and use the 't'command for each line, it would step into every function I typicallyuse 't' instead of 'p' because I usually want to step into a functionat some point in time and I tend to press 'p' one too many times ;Here is an example of the debugger session:0:000 g testmain   13: {0:000 t   14:     Foo f;0:000 t    4:     Foo : m_value0 { }0:000 t   15:     Foo::Print   fIncrement   ;0:000 t    6:     int Increment { return ++m_value; } 10:000 t   15:     Foo::PrintfIncrement;0:000 t    7:     static void Printint i { printf"%d", i; } 20:000 gu   16:     return 0;0:000 t   17: }0:000 ttest__mainCRTStartup+0x102:Let's look at the statement Foo::PrintfIncrement; When using thetrace command, it will first step into Foo::Increment 1 beforestepping into Foo::Print 2 But let's say that I never want tostep into Foo::Increment because I know that it is a simple functionthat I do not want to debug I can tell the debugger to ignore tracecommands into this function with the step_filter command The commandtakes a semi-colon delineated list of fully qualified symbol nameswhich can include wildcards so you can filter out entire modules toignore Let's see the debugger session again with this command:0:000 g testmain   13: {0:000 step_filter "testFoo::Increment"Filter out code symbols matching:testFoo::Increment0:000 t   14:     Foo f;0:000 t    4:     Foo : m_value0 { }0:000 t      15:     Foo::PrintfIncrement;   0:000 t    7:     static void Printint i { printf"%d", i; }0:000 gu   16:     return 0;0:000 t   17: }0:000 ttest__mainCRTStartup+0x102:You will see now that when I trace into Foo::PrintfIncrement;that the fIncrement call is executed but not trace into ignored isnot the right word because it has run, I just didn't see it line byline and I step directly into Foo::Print I think this is a prettypowerful debugger command, it can save you a lot of time if you arealways accidentally stepping into the wrong function like I always do;IMAGE</description><link>http://www.secuobs.com/revue/news/104814.shtml</link><guid isPermaLink="false">http://www.secuobs.com/revue/news/104814.shtml</guid></item>
<item><title>The WDF 17 cointstallers are now available</title><description>Secuobs.com : 2009-06-02 14:41:57 - A Hole In My Head -    After a long wait thank you for your patience, the WDF 17coinstallers are now up on the connect site To get the bits1 go to http://connectmicrosoftcom2 Log in using your passport account3 Navigate to the WDF page I don't know where it lives in theconnection directory, sigh4 Choose Downloads on the left5 The package is dated 4/17/2008, you may be able to get itdirectly from here once you have logged onEnjoy and let the signing and shipping of v17 WDF drivers beginIMAGE</description><link>http://www.secuobs.com/revue/news/104813.shtml</link><guid isPermaLink="false">http://www.secuobs.com/revue/news/104813.shtml</guid></item>
<item><title>What should you change in a sample before you ship it</title><description>Secuobs.com : 2009-06-02 14:41:57 - A Hole In My Head -    I was going to write about how to do this, but the awesome folks atWHDC got to it before I did I did get to review it before it waspublished, so I did have some influence in what is in the tip ; Soon this one my job is easy, just go read the tipIMAGE</description><link>http://www.secuobs.com/revue/news/104812.shtml</link><guid isPermaLink="false">http://www.secuobs.com/revue/news/104812.shtml</guid></item>
<item><title>How do I cancel an IRP that another thread may be completing at the same time</title><description>Secuobs.com : 2009-06-02 14:41:57 - A Hole In My Head -    Let's say that you allocated a PIRP and sent it down your devicestack You free the PIRP in the completion routine and then returnSTATUS_MORE_PROCESSING_REQUIRED To make life more fun, you decidethat you want to be able to cancel the sent IRP after you have sent itso you try to do it simple like thistypedef struct _DEVICE_EXTENSION {KSPIN_LOCK SentIrpLock;PIRP SentIrp;} DEVICE_EXTENSION;Sending thread:KeAcquireSpinLocketdevext-SentIrpLock, ;devext-SentIrp = Irp;KeReleaseSpinLocketdevext-SentIrpLock, ;Canceling thread:KeAcquireSpinLocketdevext-SentIrpLock, ;if devext-AllocatedIrp = NULL {IoCancelIrpdevext-SentIrp;}KeReleaseSpinLocketdevext-SentIrpLock, ;Completion routine:PIRP irp;KeAcquireSpinLocketdevext-SentIrpLock, ;irp = devext-SentIrp;devext-SentIrp = NULL;KeReleaseSpinLocketdevext-SentIrpLock, ;IoFreeIrpirp;return STATUS_MORE_PROCESSING_REQUIRED;And it then deadlocks ; If the call to IoCancelIrp causes the IRP tobe completed in the calling context eg the one which has acquiredthe lock, the completion routine will run and try to acquire the lockSentIrpLock on the same thread which holds itSo, life is not that simple and you have to do something more Thebasic solution is that you need extra state to track who is touchingthe PIRP and who can free it Walter Oney's book has a solution IIRC,it is in the self initiated I/O section, but I do not have the bookhandy, but IMHO it is a bit complicated KMDF has a solution to thisproblem which I like much more imagine that ;You need an extra LONG, calling it CompletionCount per PIRP that youwant to be able to cancel1 You initialize CompletionCount to 1 before sending it down thestack and storing it in devext2 Whenever there is a thread that wants to cancel the PIRP, ittries to interlocked increment CompletionCount only if and only ifthe current CompletionCount value is  0 For this you need toroll your own InterlockedIncrementWithFloor which is fortunatelynot that hard and I have already shown you how to do that,http://blogsmsdncom/doronh/archive/2006/12/06/creating-your-own-interlockedxxx-operationaspx3 After the canceling thread has calledInterlockedIncrementWithFloor and IoCancelIrp, it callsInterlockedDecrement4 Whomever wants to complete the PIRP, like the completion routine,interlock decrements CompletionCountIf the returned value from InterlockedDecrement is zero, the callercan complete the PIRP If not, somebody else is trying to touch thePIRP and you must leave the PIRP alone So here is the revised code:typedef struct _DEVICE_EXTENSION {KSPIN_LOCK SentIrpLock;PIRP SentIrp;ULONG CompletionCount;} DEVICE_EXTENSION;Sending thread:KeAcquireSpinLocketdevext-SentIrpLock, ;   devext-CompletionCount = 1;   devext-SentIrp = Irp;KeReleaseSpinLocketdevext-SentIrpLock, ;Canceling thread:PIRP irp = NULL;KeAcquireSpinLocketdevext-SentIrpLock, ;if devext-AllocatedIrp = NULL    etet MyInterlockedIcrementeWithFlooretdevext-CompletionCount, 0  0    {      irp = devext-SentIrp;   }KeReleaseSpinLocketdevext-SentIrpLock, ;if irp = NULL {IoCancelIrpirp;if InterlockedDecrementetdevext-CompletionCount == 0 {IoFreeIrpirp;}}Completion routine:PIRP irp = NULL;KeAcquireSpinLocketdevext-SentIrpLock, ;irp = devext-SentIrp;devext-SentIrp = NULL;KeReleaseSpinLocketdevext-SentIrpLock, ;if InterlockedDecrementetdevext-CompletionCount == 0 {       IoFreeIrpirp;   }   return STATUS_MORE_PROCESSING_REQUIRED;The beauty of this solution is that if you add more actors let's saya timer for an async timeout, all you have to do is bump theCompletionCount to account for them to asynchronously rundown if youcannot cancel themIMAGE</description><link>http://www.secuobs.com/revue/news/104811.shtml</link><guid isPermaLink="false">http://www.secuobs.com/revue/news/104811.shtml</guid></item>
<item><title>Dude, what happened to Doron</title><description>Secuobs.com : 2009-06-02 14:41:57 - A Hole In My Head -    So, I have not written anything in over 6 months and yet I have postedon NTDEV and public newsgroups What gives Well, the short answer isthat I have been short on time these past few months I have had onlyenough extra curricular time to read NTDEV and occassionally thenewsgroups Why Well, for starters we had a second baby this pastJune and that took the majority of my time ; Combine that withparental leave and a project at work taking all of my time for nearlyall of last year led to a very poor showing on the blog for 2008 Ipromise to change that this yearHave a good new year and hopefully better quality driversdIMAGE</description><link>http://www.secuobs.com/revue/news/104810.shtml</link><guid isPermaLink="false">http://www.secuobs.com/revue/news/104810.shtml</guid></item>
<item><title>MSDN link on how to set up a user or kernel debugger</title><description>Secuobs.com : 2009-06-02 14:41:57 - A Hole In My Head -    This has got to be one of the top FAQs out there: how do I set up akernel debugger I just stumbled across a link on MSDN which givesinstructions not only on how to set up a kernel debugger on alltransports serial, 1394, usb2, but also how to set up a user modedebugger or how to attach to a virtual machine Pretty cool This usedto be an internal web page at Microsoft and I think a topic indebuggerchm, it is great that it is now publicThe MSDN topic is Starting the Debugger Add it to your bookmarks andmaybe we can create room in the FAQ for another question ;IMAGE</description><link>http://www.secuobs.com/revue/news/104809.shtml</link><guid isPermaLink="false">http://www.secuobs.com/revue/news/104809.shtml</guid></item>
<item><title>Great WinHEC presentation on device interfaces compared to device clases</title><description>Secuobs.com : 2009-06-02 14:41:57 - A Hole In My Head -    This is a repeat of a post I made to NTDEV, but I wanted to make sureI reached as many people as possibleI just read this deck,http://downloadmicrosoftcom/download/5/E/6/5E66B27B-988B-4F50-AF3A-C2FF1E62180F/CON-T615_WH08pptx,which was presented at WinHEC this past year It is by far the bestexplanation of device interfaces and device classes that I have seenin 10 years It worthwhile reading for both the new and experienceddriver developerI was asked to enter a name and password to down the file, I hitcancel and it downloaded just fine I don’t know if this was anIE8/Win7 quirk or not, so YMMVIMAGE</description><link>http://www.secuobs.com/revue/news/104808.shtml</link><guid isPermaLink="false">http://www.secuobs.com/revue/news/104808.shtml</guid></item>
<item><title>WDFREQUESTs are not for sharing</title><description>Secuobs.com : 2009-06-02 14:41:57 - A Hole In My Head -    FYI: this is a bit of a long post, but I wanted to be thorough andillustrative and give some insight into how the framework works andpotential design that could have been made, but were not for the sakeof simplicity and performanceA common misconception a WDFREQUEST handle is the assumption that theWDFREQUEST handle value “follows” eg stays the same the PIRParound everywhere that the PIRP goes to Basically, the idea is thateverywhere that the PIRP is sent or presented, the same WDFREQUSThandle will be used The reality is that the same WDFREQUEST handlevalue will only be used while the PIRP is owned by the WDFDEVICE forwhich it was created This means that if you send the WDFREQUEST toanother driver with a call to WdfRequestSend eg transfer ownershipto the driver you are sending it to, the driver which receives thePIRP will have a different handle for the WFDREQUESTThis means that the WDFREQUEST cannot be used to send extra data orbuffers to the driver which will receive the sent request Forinstance, this pattern does not work First, in the sending driverlet's say request's handle value is 0xA you format a requestcontext, format the request and send it to the WDFIOTARGETtypedef struct _EXTRA_BUFFER {PVOID Data;ULONG DataLength;} EXTRA_BUFFER, *PEXTRA_BUFFER;PEXTRA_BUFFER pExtra = NULL;WDF_OBEJCT_ATTRIBUTES woa;WDF_OBJECT_ATTRIBUTES_INIT_CONTEXT_TYPEetwoa, EXTRA_BUFFER;NTSTATUS status;status = WdfObjectAllocateContextrequest, etwoa, PVOID* //  initialize pExtra //  format request as an internal IOCTL if WdfRequestSendrequest,  == FALSE {WdfRequestCompleterequest, WdfRequestGetStatusrequest;}And then in the receiving driver's WDFQUEUE dispatch routine wouldhave a new WDFREQUEST handle value let’s say 0xAA and not thesender’s WDFREQUEST 0xAVOID EvtInternalDeviceIoControl__in WDFQUEUE Queue,__in WDFREQUEST Request,__in size_t OutputBufferLength,__in size_t InputBufferLength,__in ULONG IoControlCode {  }Think about how KMDf would have been implemented if the sameWDFREQUEST was presented to the lower driver One of two things wouldhave had to occur:* The first option would have been for the WDFREQUEST handle valueto be smuggled somewhere in the PIRP with the most likelycandidate being DriverContext, but that would have beenunreliable and prone to compatibility issues with non KMDF driversin the stack* The second option would have been to maintain a global mapping inKMDF that mapped from PIRP pointer value to WDFREQUEST handleThis would have been very expensive to maintain because we wouldhave had to look up the mapping every time a WDFQUEUE handled PIRParrived in the framework It would have grown more even moreexpensive because this proposed map would have been guarded by aglobal lock which meant that all KMDF drivers would have beenacquiring and releasing this one lock, making it very contentiousand a huge performance problemLet’s take a step back and look at why there is a WDFREQUEST handlevalue when you send a request to another device Here is what happenswhen a PIRP arrives in a KMDF driver1 Call the potentially registered WDM preprocess routine2 If it is a PIRP that will be processed by a WDFQUEUE aread/write/IOCTL/internal IOCTL, allocate a WDFREQUEST for thePIRP If it is not one of these types, pass it to the appropriatepipeline in the framework3 Call the potentially registered in process callback4 Pass the WDFREQUEST to the WDFQUEUE handler so that it will bepresented on the appropriate WDFQUEUE5 The WDFQUEUE calls the right IO event callback that youregistered when the WDFQUEUE was createdStep #2 is the key here The WDFREQUEST is always allocated from alookaside if that matters It means that there is no 1:1correspondence between a PIRP and WDFREQUEST value Let’s say wewanted to have a 1:1 correspondence for a singular WDFDEVICEthoughand not across multiple WDFDEVICE as in the first example Wewould need a WDFDEVICE based mapping of active PIRPs to WDFREQUESThandles to see if the PIRP is an active PIRP in the WDFDEVICE As inthe case with the previously proposed global mapping, such a WDFDEVICEwide mapping would also be prohibitively expensive While the lockcontention would move from a global scope to a WDFDEVICE scope andthus reducing some contention, such a lock would still be aperformance hot spot since all PIRPs arriving into the driver would becontending on this lockWhen you look at how KMDF allocates a WDFREQUEST it becomes clear thatthe WDFRQUEST loosely maps back to the current stack location in thePIRP, not the PIRP itself Think about the sending the WDFREQUEST toanother driver case The sending driver has its own stack location andits own WDFREQUEST The receiving driver has its own stack locationand WDFREQUEST as well Just to reinforce this idea, let’s consider afinal example Let’s say* Your driver received a WDFREQUEST formatted for IOCTL ‘A’ in yourdispatch routine* In processing the request your driver formatted the next stacklocation it for IOCTL ‘B’* The request is sent to the top of your stack which means yourdriver will see the PIRP in its dispatch routine eventually asIOCTL ‘B’When PIRP enters your driver as IOCTL 'B', it will have a newWDFREQUEST handle This is completely by design The first stacklocation IOCTL ‘A’ has one WDFREQUEST, the subsequent stack locationIOCTL ‘B’ has another WDFREQUESTIn conclusion, WDFREQUEST handles are local to a specific WDFDEVICEIn fact, they are local to a specific stack location The context offof a WDFREQUEST can only be used by your driver for that particularWDFDEVICE and is not meant to be shared across WDFDEVICEsIMAGE</description><link>http://www.secuobs.com/revue/news/104807.shtml</link><guid isPermaLink="false">http://www.secuobs.com/revue/news/104807.shtml</guid></item>
<item><title>WDFREQUESTs are for sharing in KMDF v19</title><description>Secuobs.com : 2009-06-02 14:41:57 - A Hole In My Head -    In my last post I described why a WDFREQUEST is unique to a particularWDFDEVICE There is one particular programming pattern where this isnot the behavior you want This pattern is when you have each PDOaccepting IO requests which it then forwards on to the parentWDFDEVICE for processing One great in box example of this isusbhubsys Each usbhub PDO receives URBs which are then forwarded tothe parent FDO and the FDO is where all IO processing occursIf you want to apply this pattern to a KMDF driver written to a v17or earlier and take advantage of WDFQUEUEs you had to send therequests from the PDO to the FDO with WdfRequestSend so that they wererepresented to the FDO The easiest way to do this is to create aWDFIOTARGET for the FDO itself and then have each PDO send IO to thatWDFIOTARGET as shown in the following 3 code snippetsEvtDriverDeviceAdd for the FDONTSTATUS EvtDriverDeviceAddWDFDRIVER Driver, PWDFDEVICE_INIT DeviceInit {WDFDEVICE device;WDF_OBJECT_ATTRIBUTES woa;WDF_OBJECT_ATTRIBUTES_INIT_CONTEXT_TYPEetwoa, FDO_EXTENSION;// initialize the DeviceInitstatus = WdfDeviceCreateetDeviceInit, etwoa, if NT_SUCCESSstatus {return status;}PFDO_EXTENSION pFdoExt = GetFdoExtdevice;status = WdfIoTargetCreatedevice, WDF_NO_OBJECT_ATTRIBUTES, if NT_SUCCESSstatus {return status;}// open the WDFIOTARGET to point our own PDEVICE_OBJECTWDF_IO_TARGET_OPEN_PARAMS openParams;WDF_IO_TARGET_OPEN_PARAMS_INIT_EXISTING_DEVICEopenParams, WdfDeviceWdmGetDeviceObjectdevice;status = WdfIoTargetOpenpFdoExt-SelfTarget, if NT_SUCCESSstatus {return status;}return status;}EvtChildListCreateDevice for the PDONTSTATUS EvtChildListCreateDeviceWDFCHILDLIST ChildList, PWDF_CHILD_IDENTIFICATION_DESCRIPTION_HEADER IdentificationDescription,PWDFDEVICE_INIT ChildInit{WDFDEVICE pdo;NTSTATUS status;WDF_OBJECT_ATTRIBUTES woa;WDF_OBJECT_ATTRIBUTES_INIT_CONTEXT_TYPEetwoa, PDO_EXTENSION;status = WdfDeviceCreateetChildInit, etwoa, if NT_SUCCESSstatus {return status;}PPDO_EXTENSION pPdoExt = GetPdoExtdevice;PFDO_EXTENSION pFdoExt = GetFdoExtWdfPdoGetParentdevice;pPdoExt-ParentTarget = pFdoExt-SelfTarget;return status;}EvtIoDefault for the PDOtypedefVOID EvtIoDefaultWDFQUEUE Queue, WDFREQUEST Request{//   extract Request type if    RequestShouldBeSentToParent    {WdfRequestFormatRequestUsingCurrentTypeRequest;WdfRequestSendRequest, GetPdoExtWdfIoQueueGetDeviceQueue-ParentTarget;}}As you can see, this is a bit cumbersome While it works, it is notideal The KMDF team addressed this issue in v19 by adding 2 new DDIsthat must be used together1 WdfPdoInitAllowForwardingRequestToParent which tells KMDF thatyou will be forwarding IO from a PDO to the parent FDO Internallythis sets up some bookkeeping and, more importantly, sets thePDO’s PDEVICE_OBJECT’s StackSize to the FDO’s PDEVICE_OBJECTStackSize+1 so that there will be enough stack locations in theunderlying PIRP for both the parent and child stacks2 WdfRequestForwardToParentDeviceIoQueue which removes the need forthe custom WDFIOTARGET and WdfRequestSend and directly presentsthe PDO’s WDFREQUEST to the FDO’s WDFQUEUELet’s now rewrite the 3 code snippets to make use of these new DDIs,new code in redNEW EvtDriverDeviceAdd for the FDONTSTATUS EvtDriverDeviceAddWDFDRIVER Driver, PWDFDEVICE_INIT DeviceInit {WDFDEVICE device;WDF_OBJECT_ATTRIBUTES woa;WDF_OBJECT_ATTRIBUTES_INIT_CONTEXT_TYPEetwoa, FDO_EXTENSION;// initialize the DeviceInitstatus = WdfDeviceCreateetDeviceInit, etwoa, if NT_SUCCESSstatus {return status;}PFDO_EXTENSION pFdoExt = GetFdoExtdevice;      status = WdfIoTargetCreatedevice, WDF_NO_OBJECT_ATTRIBUTES, if NT_SUCCESSstatus {return status;}// open the WDFIOTARGET to point our own PDEVICE_OBJECTWDF_IO_TARGET_OPEN_PARAMS openParams;WDF_IO_TARGET_OPEN_PARAMS_INIT_EXISTING_DEVICEopenParams, WdfDeviceWdmGetDeviceObjectdevice;status = WdfIoTargetOpenpFdoExt-SelfTarget, if NT_SUCCESSstatus {return status;}         // initialize a WDF_IO_QUEUE_CONFIG to your needsstatus = WdfIoQueueCreatedevice, , if NT_SUCCESSstatus {return status;}      return status;}NEW EvtChildListCreateDevice for the PDONTSTATUS EvtChildListCreateDeviceWDFCHILDLIST ChildList, PWDF_CHILD_IDENTIFICATION_DESCRIPTION_HEADER IdentificationDescription,PWDFDEVICE_INIT ChildInit{WDFDEVICE pdo;NTSTATUS status;WDF_OBJECT_ATTRIBUTES woa;WDF_OBJECT_ATTRIBUTES_INIT_CONTEXT_TYPEetwoa, PDO_EXTENSION;WdfPdoInitAllowForwardingRequestToParentChildInit;      status = WdfDeviceCreateetChildInit, etwoa, if NT_SUCCESSstatus {return status;}PPDO_EXTENSION pPdoExt = GetPdoExtdevice;PFDO_EXTENSION pFdoExt = GetFdoExtWdfPdoGetParentdevice;pPdoExt-ParentTarget = pFdoExt-SelfTarget;      return status;}NEW EvtIoDefault for the PDOVOID EvtIoDefaultWDFQUEUE Queue, WDFREQUEST Request{//   extract Request type if    RequestShouldBeSentToParent    {          WdfRequestFormatRequestUsingCurrentTypeRequest;WdfRequestSendRequest, GetPdoExtWdfIoQueueGetDeviceQueue-ParentTarget;             WDF_REQUEST_FORWARD_OPTIONS options;WDF_REQUEST_FORWARD_OPTIONS_INIToptionsFlags = WDF_REQUEST_FORWARD_OPTIONS_FLAGS;status = WdfRequestForwardToParentDeviceIoQueueRequest, GetFdoExtWdfPdoGetParentWdfIoQueueGetDeviceQueue-ChildProcessingQueue,       }}IMAGE</description><link>http://www.secuobs.com/revue/news/104806.shtml</link><guid isPermaLink="false">http://www.secuobs.com/revue/news/104806.shtml</guid></item>
</channel>
</rss>
 
