When there is nothing left to do, the CPUs will go into the idle state to wait until it is needed again. These idle modes, which go by names like “C states,” vary in the amount of power saved, but also in the amount of ancillary information which may be lost and the amount of time required to get back into a fully-functional mode.
Intel_idle driver actually performs idle state power management, and regsiters into existing CPU Idle subsystem and extends driver for Merrifield CPU (Slivermont). It also introduces a new platform idle states C7x deeper than traditional C6 state. The overall idea is that CPU C7x-states are extended to devices and rest of the platform, hence puting the platform to S0ix states.
On intel mid platform (Merrifield), the PMU driver communicates with the Intel_idle driver, platform device drivers, pci drivers, and the PMU firmwares (NC: Punit, SC:SCU) to coordinate platform power state transitions.
The PMU driver provides a platform-specific callback to the Intel_idle driver so that long periods of idleness can be extended to the entire platform.
- soc_s0ix_idle()–>enter_s0ix_state() // intel_idle driver defined at /drivers/idle/intel_idle.c
- mid_s0ix_enter() // PMU driver defined at /arch/x86/platform/intel-mid/intel_soc_pmu.c
I could find out cpuidle_state->.enter() associated with soc_s0ix_idle() for C6 state on medfield platform, and the call stack looks like as above shown. However, I don’t figure out the similar assoication between intel_idle and pmu driver on merrifield platform. I am curious if this statement is also appliciable on merrifield platform ??
Based on hint of idleness, the PMU driver extends CPU idleness to the reset of the platform via standard Linux PM_QoS, Runtime PM, and PCI PM calls.
Once the CPU and devices are all idle, the PMU driver programs the North and South Complex PMUs to implement the required power transitiion for S0ix to eumlate C7-x states.
Here is very good reference to understand the cpuidle governor and subsystem http://lwn.net/Articles/384146/ before digging into how intel_idle driver fits with current cpuilde intrastructure.
The below is code trace for intel_idle driver located at “/drivers/idle/intel_idle.c” – Comment in intel_idle driver
1 2 3 4 5 6 |
|
- Driver Init.
- intel_idle_probe() starts to match current CPU with array of x86_cpu_ids via and assign corresponding default cpuidle C-state table
- intel_idle_cpuidle_driver_init() checks real C-state table and update its state when needed (eg, target_residency)
- register the intel_dile driver with the cpudile subsystem through cpuidle_register_driver()
- intel_idle_cpu_init() allocates, initializes, and registers cpuidle_device for each CPU
- register cpu_hotplug_notifer to know about CPUs going up/dow
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
|
- Default cpuidle C-states for merrifield
- “flags” field describes the characteristics of this sleep state
- CPUIDLE_FLAG_TIME_VALID should be set if it is possible to accurately measure the amount of time spent in this particular idle state.
- CPUIDLE_FLAG_TLB_FLUSHED is set to inidicate the HW flushes the TLB for this state.
- MWAIT takes an 8-bit “hint” in EAX “suggesting” the C-state (top nibble) and sub-state (bottom nibble). 0x00 means “MWAIT(C1)”, 0x10 means “MWAIT(C2)” etc.
- “exit_latency” in US says how long it takes to get back to a fully functional state.
- “target_residency” in US is the minimum amount of time the processor should spend in this state to make the transition worth the effort.
- The “enter()” function will be called when the current governor decides to put the CPU into the given state
- “flags” field describes the characteristics of this sleep state
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
|
The actual performs given C-state transitions implemented by intel_idle(). As we saw, this is done through “enter()” functions associated with each state. The decision as to which idle state makes sense in a given situation is very much a policy issue implemented by the cpuidle “governors”.
A call to enter() is a request from the current governor to put the CPU associated with dev into the given state. Note that enter() is free to choose a different state if there is a good reason to do so, but it should store the actual state used in the device’s last_state field.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
|