Android Vibrator on MSM8909 Platform

Dec 22nd, 2014

QPNP (Qualcomm Plug Plug N Play) vibrator is a peripheral on Qualcomm PMICs. The PMIC is connected to MSM8909 via SPMI bus. Its driver uses the timed-output framework to interface with the userspace.

1. MSM8909 Vibrator driver implementation and DTS file

The vibrator is connected on the VIB_DRV_N line of the QPNP PMIC, and the PMIC vibrator driver can be used to program the voltage from 1.2 V to 3.1 V in 100 mV step through VIB_DRV_N pin.

DTSI documentation

kernel/arch/arm/boot/dts/qcom/msm-pm8909.dtsi

            pm8909_vib: qcom,vibrator@c000 {
                    compatible = "qcom,qpnp-vibrator";
                    reg = <0xc000 0x100>;
                    label = "vibrator";
                    status = "disabled";
            };

Documentation/devicetree/bindings/platform/msm/qpnp-vibrator.txt

Required Properties:

- status: default status is set to "disabled.  Must be "okay"  
- compatible: must be "qcom,qpnp-vibrator"  
- label: name which describes the device  
- reg: address of device

Optional Properties:

- qcom,vib-timeout-ms: timeout of vibrator, in ms.  Default 15000 ms  
- qcom,vib-vtg-level-mV: voltage level, in mV.  Default 3100 mV

vibrator driver source

QPNP vibrator operates in two modes – manual and PWM (Pulse Width Modulation). PWM is supported through one of dtest lines connected to the vibrator. Manual mode is used on MSM8909 platform.

qpnp_vibrator_prboe() under kernel/drivers/platform/msm/qpnp-vibrator.c

static int qpnp_vibrator_probe(struct spmi_device *spmi)
{
        struct qpnp_vib *vib;
        struct resource *vib_resource;
        int rc;

        vib = devm_kzalloc(&spmi->dev, sizeof(*vib), GFP_KERNEL);
        if (!vib)
                return -ENOMEM;

        vib->spmi = spmi;

        vib_resource = spmi_get_resource(spmi, 0, IORESOURCE_MEM, 0);
        if (!vib_resource) {
                dev_err(&spmi->dev, "Unable to get vibrator base address\n");
                return -EINVAL;
        }
        vib->base = vib_resource->start;

        rc = qpnp_vib_parse_dt(vib);
        if (rc) {
                dev_err(&spmi->dev, "DT parsing failed\n");
                return rc;
        }

        rc = qpnp_vibrator_config(vib);
        if (rc) {
                dev_err(&spmi->dev, "vib config failed\n");
                return rc;
        }

        mutex_init(&vib->lock);
        INIT_WORK(&vib->work, qpnp_vib_update);

        hrtimer_init(&vib->vib_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
        vib->vib_timer.function = qpnp_vib_timer_func;

        vib->timed_dev.name = "vibrator";
        vib->timed_dev.get_time = qpnp_vib_get_time;
        vib->timed_dev.enable = qpnp_vib_enable;

        dev_set_drvdata(&spmi->dev, vib);

        rc = timed_output_dev_register(&vib->timed_dev);
        if (rc < 0)
                return rc;

        return rc;
}

Uses timed output framework

callback .enable and .get_time of struct timed_output_dev is hooked by qpnp_vib_enable() and qpnp_vib_get_time().

Implementation of qnp_vib_enable() and qpnp_vib_get_time()

static void qpnp_vib_enable(struct timed_output_dev *dev, int value)
{
        struct qpnp_vib *vib = container_of(dev, struct qpnp_vib,
                                         timed_dev);

        mutex_lock(&vib->lock);
        hrtimer_cancel(&vib->vib_timer);

        if (value == 0)
                vib->state = 0;
        else {
                value = (value > vib->timeout ?
                                 vib->timeout : value);
                vib->state = 1;
                hrtimer_start(&vib->vib_timer,
                              ktime_set(value / 1000, (value % 1000) * 1000000),
                              HRTIMER_MODE_REL);
        }
        mutex_unlock(&vib->lock);
        schedule_work(&vib->work);
}

static int qpnp_vib_get_time(struct timed_output_dev *dev)
{
        struct qpnp_vib *vib = container_of(dev, struct qpnp_vib,
                                                         timed_dev);

        if (hrtimer_active(&vib->vib_timer)) {
                ktime_t r = hrtimer_get_remaining(&vib->vib_timer);
                return (int)ktime_to_us(r);
        } else
                return 0;
}

wq is scheduled to execute work of qpnp_vib_update() and turns off vibrator via qpnp_vibrator_suspend() at suspend callback.

Inside qpnp_vib_update() and qpnp_vibrator_suspend(), they both call qpnp_vib_set() to turn on/off vibrator.

Implementation of qpnp_vib_set() to control vibrator via PWM or Manual Mode

static int qpnp_vib_set(struct qpnp_vib *vib, int on)
{
        int rc;
        u8 val;

        if (on) {
                if (vib->mode != QPNP_VIB_MANUAL)
                        pwm_enable(vib->pwm_info.pwm_dev);
                else {
                        val = vib->reg_en_ctl;
                        val |= QPNP_VIB_EN;
                        rc = qpnp_vib_write_u8(vib, &val,
                                        QPNP_VIB_EN_CTL(vib->base));
                        if (rc < 0)
                                return rc;
                        vib->reg_en_ctl = val;
                }
        } else {
                if (vib->mode != QPNP_VIB_MANUAL)
                        pwm_disable(vib->pwm_info.pwm_dev);
                else {
                        val = vib->reg_en_ctl;
                        val &= ~QPNP_VIB_EN;
                        rc = qpnp_vib_write_u8(vib, &val,
                                        QPNP_VIB_EN_CTL(vib->base));
                        if (rc < 0)
                                return rc;
                        vib->reg_en_ctl = val;
                }
        }

        return 0;
}

2. Timed output class driver

Timed output is a class drvier to allow changing a state and restore is automatically after a specfied timeout. This exposes a user space interface under /sys/class/timed_output/vibrator/enable used by vibrator code.

kernel/drivers/staging/android/timed_output.c

static ssize_t enable_show(struct device *dev, struct device_attribute *attr,
                char *buf)
{
        struct timed_output_dev *tdev = dev_get_drvdata(dev);
        int remaining = tdev->get_time(tdev);

        return sprintf(buf, "%d\n", remaining);
}

static ssize_t enable_store(
                struct device *dev, struct device_attribute *attr,
                const char *buf, size_t size)
{
        struct timed_output_dev *tdev = dev_get_drvdata(dev);
        int value;

        if (sscanf(buf, "%d", &value) != 1)
                return -EINVAL;

        tdev->enable(tdev, value);

        return size;
}
int timed_output_dev_register(struct timed_output_dev *tdev)
{
        int ret;

        if (!tdev || !tdev->name || !tdev->enable || !tdev->get_time)
                return -EINVAL;

        ret = create_timed_output_class();
        if (ret < 0)
                return ret;

        tdev->index = atomic_inc_return(&device_count);
        tdev->dev = device_create(timed_output_class, NULL,
                MKDEV(0, tdev->index), NULL, tdev->name);
        if (IS_ERR(tdev->dev))
                return PTR_ERR(tdev->dev);

        ret = device_create_file(tdev->dev, &dev_attr_enable);
        if (ret < 0)
                goto err_create_file;

        dev_set_drvdata(tdev->dev, tdev);
        tdev->state = 0;
        return 0;

err_create_file:
        device_destroy(timed_output_class, MKDEV(0, tdev->index));
        pr_err("failed to register driver %s\n",
                        tdev->name);

        return ret;
}
EXPORT_SYMBOL_GPL(timed_output_dev_register);

3. Android vibrator HAL

Its interface to linux device drivers call with the specified timeout in millisecond via vibrator_on().

hardware/libhardware_legacy/vibrator/vibrator.c

#define THE_DEVICE "/sys/class/timed_output/vibrator/enable"
static int sendit(int timeout_ms)
{
    int nwr, ret, fd;
    char value[20];

#ifdef QEMU_HARDWARE
    if (qemu_check()) {
        return qemu_control_command( "vibrator:%d", timeout_ms );
    }
#endif

    fd = open(THE_DEVICE, O_RDWR);
    if(fd < 0)
        return errno;

    nwr = sprintf(value, "%d\n", timeout_ms);
    ret = write(fd, value, nwr);

    close(fd);

    return (ret == nwr) ? 0 : -1;
}

int vibrator_on(int timeout_ms)
{
    /* constant on, up to maximum allowed time */
    return sendit(timeout_ms);
}

int vibrator_off()
{
    return sendit(0);
}

4. Native layer through JNI between HAL and vibrator service

Controls of vibrator service reaches via vibratorOn(), vibratorOff(), and vibratorExists().

frameworks//base/services/core/jni/com_android_server_VibratorService.cpp

static void vibratorOn(JNIEnv *env, jobject clazz, jlong timeout_ms)
{
    // ALOGI("vibratorOn\n");
    vibrator_on(timeout_ms);
}

static void vibratorOff(JNIEnv *env, jobject clazz)
{
    // ALOGI("vibratorOff\n");
    vibrator_off();
}

static JNINativeMethod method_table[] = {
    { "vibratorExists", "()Z", (void*)vibratorExists },
    { "vibratorOn", "(J)V", (void*)vibratorOn },
    { "vibratorOff", "()V", (void*)vibratorOff }
};

int register_android_server_VibratorService(JNIEnv *env)
{
    return jniRegisterNativeMethods(env, "com/android/server/VibratorService",
            method_table, NELEM(method_table));
}

5. Java Vibrator Service

In application layer before controling vibrator, applications have to get the access to the vibrator service. The control goes into the vibrator service (Framework Layer) thru binder inteface. For example, App. creates Vibrator object and start the vibration via startVibrationLocked(vib).

Inside startVibrationLocked():

If mTimeout != 0, then call the JNI function vibratorOn(). else, call the code, which handle the rhythmic vibration pattern, which intern call the vibratorOn() through VibrateThread().

frameworks/base/services/core/java/com/android/server/VibratorService.java

// Lock held on mVibrations
private void startVibrationLocked(final Vibration vib) {
    try {
        if (mLowPowerMode
                && vib.mUsageHint != AudioAttributes.USAGE_NOTIFICATION_RINGTONE) {
            return;
        }

        int mode = mAppOpsService.checkAudioOperation(AppOpsManager.OP_VIBRATE,
                vib.mUsageHint, vib.mUid, vib.mOpPkg);
        if (mode == AppOpsManager.MODE_ALLOWED) {
            mode = mAppOpsService.startOperation(AppOpsManager.getToken(mAppOpsService),
                AppOpsManager.OP_VIBRATE, vib.mUid, vib.mOpPkg);
        }
        if (mode != AppOpsManager.MODE_ALLOWED) {
            if (mode == AppOpsManager.MODE_ERRORED) {
                Slog.w(TAG, "Would be an error: vibrate from uid " + vib.mUid);
            }
            mH.post(mVibrationRunnable);
            return;
        }
    } catch (RemoteException e) {
    }
    if (vib.mTimeout != 0) {
        doVibratorOn(vib.mTimeout, vib.mUid, vib.mUsageHint);
        mH.postDelayed(mVibrationRunnable, vib.mTimeout);
    } else {
        // mThread better be null here. doCancelVibrate should always be
        // called before startNextVibrationLocked or startVibrationLocked.
        mThread = new VibrateThread(vib);
        mThread.start();
    }
}

private void doVibratorOn(long millis, int uid, int usageHint) {
    synchronized (mInputDeviceVibrators) {
        if (DEBUG) {
            Slog.d(TAG, "Turning vibrator on for " + millis + " ms.");
        }
        try {
            mBatteryStatsService.noteVibratorOn(uid, millis);
            mCurVibUid = uid;
        } catch (RemoteException e) {
        }
        final int vibratorCount = mInputDeviceVibrators.size();
        if (vibratorCount != 0) {
            final AudioAttributes attributes = new AudioAttributes.Builder().setUsage(usageHint)
                    .build();
            for (int i = 0; i < vibratorCount; i++) {
                mInputDeviceVibrators.get(i).vibrate(millis, attributes);
            }
        } else {
            vibratorOn(millis);
        }
    }
}

Notes of Debugging the Kernel Using Ftrace

Oct 7th, 2014

This page has my notes for debugging the kernel using Ftrace

Ftrace is a tracing utility built directly into the Linux kernel. Ftrace was introduced in kernel 2.6.27 by Steven Rostedy and Ingo Molnar. It comes with its own ring buffer for storing trace data, and uses the GCC profiling mechanism. Documentation for this can be found in the Linux kernel source tree at Documentation/trace/ftrace.txt

References:

Setting up Ftrace:

When Ftrace is configured, it will create its own directory called tracing within the debugfs file system.

[~]# cd /sys/kernel/debug/tracing
[tracing]#

For the purpose of debugging, the kernel configuration parameters that should be enabled are:
* Kernel Function Tracer (FUNCTION_TRACER)
* Kernel Function Graph Tracer (FUNCTION_GRAPH_TRACER)
* Enable/disable ftrace dynamically (DYNAMIC_FTRACE)

The function tracer uses the -pg of gcc to have every function in the kernel call a special funciton mcount(). During the compilation the mcount() call-sites are recorded, and that list is used at boot time to convert those call-sites to NOP when CONFIG_DYNAMIC_FTRACE is set. When the function or function graph tracer is enabled, that list is saved to convert those call-sites back to trace calls.

To find out which tracers are available, simply cat the available_tracers file in the tracing directory:

root@K015:/ # cat /d/tracing/available_tracers
blk function_graph wakeup_rt wakeup preemptirqsoff preemptoff irqsoff function nop

To enable the function tracer, just echo “function” into the current_tracer file.

root@K015:/d/tracing # echo function > current_tracer
root@K015:/d/tracing # cat> tracer
# tracer: function
#
# entries-in-buffer/entries-written: 205147/172709617   #P:4
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
        Binder_D-4038  [003] ....    98.468143: cgroup_tasks_write <-cgroup_file_write
        Binder_D-4038  [003] ....    98.468143: attach_task_by_pid <-cgroup_tasks_write
        Binder_D-4038  [003] ....    98.468143: cgroup_lock_live_group <-attach_task_by_pid
        Binder_D-4038  [003] ....    98.468144: mutex_lock <-cgroup_lock_live_group
        Binder_D-4038  [003] ....    98.468144: __might_sleep <-mutex_lock
        Binder_D-4038  [003] ....    98.468144: __mutex_lock_slowpath <-mutex_lock
        Binder_D-4038  [003] ....    98.468144: add_preempt_count <-__mutex_lock_slowpath
        Binder_D-4038  [003] ...1    98.468145: sub_preempt_count <-__mutex_lock_slowpath
        Binder_D-4038  [003] ....    98.468145: __rcu_read_lock <-attach_task_by_pid

A header is printed with the tracer name that is represented by the trace. In this case the tracer is “function”. Then it shows the number of events in the buffer as well as the total number of entries that were written. The difference is the number of entries that were lost due to the buffer filling up (172709617 – 205147 = 172504407 events lost). #P is the number of online CPUs (#P:4).

The header explains the content of the events. Task name “Binder_D”, the task PID “4038”, the CPU that it was running on “003”, the latency format (explained below), the timestamp in . format, the function name that was traced “cgroup_tasks_write” and the parent function that called this function “cgroup_file_write”. The timestamp is the time at which the function was entered.

irqs-off: 'd' interrupts are disabled. '.' otherwise.
    Note: If the architecture does not support a way to
      read the irq flags variable, an 'X' will always
      be printed here.

need-resched:
'N' both TIF_NEED_RESCHED and PREEMPT_NEED_RESCHED is set,
'n' only TIF_NEED_RESCHED is set,
'p' only PREEMPT_NEED_RESCHED is set,
'.' otherwise.

hardirq/softirq:
'H' - hard irq occurred inside a softirq.
'h' - hard irq is running
's' - soft irq is running
'.' - normal context.

preempt-depth: The level of preempt_disabled

The above is mostly meaningful for kernel developers.

Tips

Using trace_prink()

If you are debugging a high volume area such as the timer interrupt, the scheduler, or the network, printk() can lead to bogging down the system or can even create a live lock. It is also quite common to see a bug “disappear” when adding a few printk()s. This is due to the sheer overhead that printk() introduces.

Ftrace introduces a new form of printk() called trace_printk().

For example, below code add entry/exit checkpoints to know how long system stays at standy mode

arch/x86/platform/intel-mid/intel_soc_pmu.c

static int mid_suspend_enter(suspend_state_t state)
{
        int ret;

        if (state != PM_SUSPEND_MEM)
                return -EINVAL;
...<SKIP>...
        trace_printk("s3_entry\n");
        ret = standby_enter();
        trace_printk("s3_exit %d\n", ret);
...<SKIP>...
        return ret;
}

trace_printk() output will appear in any tracer, even the function and function graph tracers.

# tracer: nop
#
# entries-in-buffer/entries-written: 4/4   #P:4
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
    kworker/u8:5-1178  [000] d...   318.726581: mid_suspend_enter: s3_entry
    kworker/u8:5-1178  [000] d...   377.664365: mid_suspend_enter: s3_exit 0
    kworker/u8:5-1178  [000] d...   378.514933: mid_suspend_enter: s3_entry
    kworker/u8:5-1178  [000] d...   569.010863: mid_suspend_enter: s3_exit 0

Tracing a specific process

Trace a specific process, or set of processes. The file set_ftrace_pid lets you specify specific processes that you want to trace.

[tracing]# echo $$ > set_ftrace_pid

The above will set the function tracer to only trace the bash shell that executed the echo command.

Clear the set_ftrace_pid file if you want to go back to generic function tracing

[tracing]# echo -1 > set_ftrace_pid

Capturing Ftrace to oops when kernel panic

You can capture the function calls leading up to a panic by placing the following on the kernel command line

ftrace=function ftrace_dump_on_oops

or, by echoing a “1” into /proc/sys/kernel/ftrace_dump_on_oops, will enable Ftrace to dump to the console the entire trace buffer in ASCII format on oops or panic.

Find latencies on kernel startup

Use the following on the kernel command line:

tracing_thresh=10000 ftrace=function_graph

this traces all functions taking longer than 2000 microseconds (10 ms).

Android RenderScript Notes

Dec 12th, 2013

This page has my notes from initial study for Android RenderScript.

RenderScript is Android 3D graphics rendering and intensive computation using heterogeneous computing.
1. Portability: A general purpose compute API across different system computing hardware
2. High Performance
3. Developer Friendly: A compute API similar to CUDA, OpenCL or GLSL, and a familiar language with C99

There are three major components in RenderScript
1. Offline compiler (llvm-rs-cc): Convert script files into portable bitcode and reflected Java layer
2. Online JIT compiler (libbcc): Translate portable bitcode to appropriate machine code (CPU/GPU/DSP/…)
3. Runtime library support (libRS): Manage scripts from Dalvik layer and also provide basic support libraries (math functions, etc.)
A good introduciton of Android RenderScript on LLVM is given by Shih-Wei Liao, and a good comparsion and analysis of different android programming model is given by Kandroid S/W Fundamentals Study Group at 12th Kandroid Conference, 2013.

Clang is offline frontend compiler to create LLVM bitcode and reflected Java layer. The portable bitcode supplied as a resource within .apk container and is compiled before use one the device. Offline compiler, llvm-rs-cc, performs machine-independent optimizations on host before emitting portable bitcode so that the online JIT on android devices can be light-weight.

Online JIT compiler, libbcc, performs target-specific optimizations and code generation and links dynamically against vendor-specific runtime library funcitons (lib*.bc). An example: RenderScript’s runtime (libclcore.bc) comes with vector operations. Xoom’s libclcore will have different CPU/GPU support (VFP3-16) than Nexus S’s (NEON).

Intel_mide: CPUIDLE: Intel_idle Processor Driver and New C-States

Dec 6th, 2013

When there is nothing left to do, the CPUs will go into the idle state to wait until it is needed again. These idle modes, which go by names like “C states,” vary in the amount of power saved, but also in the amount of ancillary information which may be lost and the amount of time required to get back into a fully-functional mode.

Intel_idle driver actually performs idle state power management, and regsiters into existing CPU Idle subsystem and extends driver for Merrifield CPU (Slivermont). It also introduces a new platform idle states C7x deeper than traditional C6 state. The overall idea is that CPU C7x-states are extended to devices and rest of the platform, hence puting the platform to S0ix states.

On intel mid platform (Merrifield), the PMU driver communicates with the Intel_idle driver, platform device drivers, pci drivers, and the PMU firmwares (NC: Punit, SC:SCU) to coordinate platform power state transitions.

The PMU driver provides a platform-specific callback to the Intel_idle driver so that long periods of idleness can be extended to the entire platform.
- soc_s0ix_idle()–>enter_s0ix_state() // intel_idle driver defined at /drivers/idle/intel_idle.c
- mid_s0ix_enter() // PMU driver defined at /arch/x86/platform/intel-mid/intel_soc_pmu.c
I could find out cpuidle_state->.enter() associated with soc_s0ix_idle() for C6 state on medfield platform, and the call stack looks like as above shown. However, I don’t figure out the similar assoication between intel_idle and pmu driver on merrifield platform. I am curious if this statement is also appliciable on merrifield platform ??
Based on hint of idleness, the PMU driver extends CPU idleness to the reset of the platform via standard Linux PM_QoS, Runtime PM, and PCI PM calls.
Once the CPU and devices are all idle, the PMU driver programs the North and South Complex PMUs to implement the required power transitiion for S0ix to eumlate C7-x states.

Here is very good reference to understand the cpuidle governor and subsystem http://lwn.net/Articles/384146/ before digging into how intel_idle driver fits with current cpuilde intrastructure.

The below is code trace for intel_idle driver located at “/drivers/idle/intel_idle.c” – Comment in intel_idle driver

/*
 * intel_idle is a cpuidle driver that loads on specific Intel processors
 * in lieu of the legacy ACPI processor_idle driver.  The intent is to
 * make Linux more efficient on these processors, as intel_idle knows
 * more than ACPI, as well as make Linux more immune to ACPI BIOS bugs.
 */

Driver Init.
- intel_idle_probe() starts to match current CPU with array of x86_cpu_ids via and assign corresponding default cpuidle C-state table
- intel_idle_cpuidle_driver_init() checks real C-state table and update its state when needed (eg, target_residency)
- register the intel_dile driver with the cpudile subsystem through cpuidle_register_driver()
- intel_idle_cpu_init() allocates, initializes, and registers cpuidle_device for each CPU
- register cpu_hotplug_notifer to know about CPUs going up/dow

intel_idle_init()

static int __init intel_idle_init(void)
{
        int retval, i;

        /* Do not load intel_idle at all for now if idle= is passed */
        if (boot_option_idle_override != IDLE_NO_OVERRIDE)
                return -ENODEV;

        retval = intel_idle_probe();
        if (retval)
                return retval;

        intel_idle_cpuidle_driver_init();
        retval = cpuidle_register_driver(&intel_idle_driver);
        if (retval) {
                struct cpuidle_driver *drv = cpuidle_get_driver();
                printk(KERN_DEBUG PREFIX "intel_idle yielding to %s",
                        drv ? drv->name : "none");
                return retval;
        }

        intel_idle_cpuidle_devices = alloc_percpu(struct cpuidle_device);
        if (intel_idle_cpuidle_devices == NULL)
                return -ENOMEM;

        for_each_online_cpu(i) {
                retval = intel_idle_cpu_init(i);
                if (retval) {
                        cpuidle_unregister_driver(&intel_idle_driver);
                        return retval;
                }

                if (platform_is(INTEL_ATOM_BYT)) {
                        /* Disable automatic core C6 demotion by PUNIT */
                        if (wrmsr_on_cpu(i, CLPU_CR_C6_POLICY_CONFIG,
                                        DISABLE_CORE_C6_DEMOTION, 0x0))
                                pr_err("Error to disable core C6 demotion");
                        /* Disable automatic module C6 demotion by PUNIT */
                        if (wrmsr_on_cpu(i, CLPU_MD_C6_POLICY_CONFIG,
                                        DISABLE_MODULE_C6_DEMOTION, 0x0))
                                pr_err("Error to disable module C6 demotion");
                }
        }
        register_cpu_notifier(&cpu_hotplug_notifier);

        return 0;
}

Default cpuidle C-states for merrifield
- “flags” field describes the characteristics of this sleep state
  - CPUIDLE_FLAG_TIME_VALID should be set if it is possible to accurately measure the amount of time spent in this particular idle state.
  - CPUIDLE_FLAG_TLB_FLUSHED is set to inidicate the HW flushes the TLB for this state.
  - MWAIT takes an 8-bit “hint” in EAX “suggesting” the C-state (top nibble) and sub-state (bottom nibble). 0x00 means “MWAIT(C1)”, 0x10 means “MWAIT(C2)” etc.
- “exit_latency” in US says how long it takes to get back to a fully functional state.
- “target_residency” in US is the minimum amount of time the processor should spend in this state to make the transition worth the effort.
- The “enter()” function will be called when the current governor decides to put the CPU into the given state

Merrifield CPUidle C states

#if defined(CONFIG_REMOVEME_INTEL_ATOM_MRFLD_POWER)
static struct cpuidle_state mrfld_cstates[CPUIDLE_STATE_MAX] = {
        { /* MWAIT C1 */
                .name = "C1-ATM",
                .desc = "MWAIT 0x00",
                .flags = MWAIT2flg(0x00) | CPUIDLE_FLAG_TIME_VALID,
                .exit_latency = 1,
                .target_residency = 4,
                .enter = &intel_idle },
        { /* MWAIT C4 */
                .name = "C4-ATM",
                .desc = "MWAIT 0x30",
                .flags = MWAIT2flg(0x30) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
                .exit_latency = 100,
                .target_residency = 400,
                .enter = &intel_idle },
        { /* MWAIT C6 */
                .name = "C6-ATM",
                .desc = "MWAIT 0x52",
                .flags = MWAIT2flg(0x52) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
                .exit_latency = 140,
                .target_residency = 560,
                .enter = &intel_idle },
        { /* MWAIT C7-S0i1 */
                .name = "S0i1-ATM",
                .desc = "MWAIT 0x60",
                .flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
                .exit_latency = 1200,
                .target_residency = 4000,
                .enter = &intel_idle },
        { /* MWAIT C9-S0i3 */
                .name = "S0i3-ATM",
                .desc = "MWAIT 0x64",
                .flags = MWAIT2flg(0x64) | CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
                .exit_latency = 10000,
                .target_residency = 20000,
                .enter = &intel_idle },
        {
                .enter = NULL }
};
#else
#define mrfld_cstates atom_cstates
#endif

The actual performs given C-state transitions implemented by intel_idle(). As we saw, this is done through “enter()” functions associated with each state. The decision as to which idle state makes sense in a given situation is very much a policy issue implemented by the cpuidle “governors”.

A call to enter() is a request from the current governor to put the CPU associated with dev into the given state. Note that enter() is free to choose a different state if there is a good reason to do so, but it should store the actual state used in the device’s last_state field.

intel_idle

/**                                                                                                                                 
 * intel_idle
 * @dev: cpuidle_device
 * @drv: cpuidle driver
 * @index: index of cpuidle state
 *
 * Must be called under local_irq_disable().
 */
static int intel_idle(struct cpuidle_device *dev,
                struct cpuidle_driver *drv, int index)
{
        unsigned long ecx = 1; /* break on interrupt flag */
        struct cpuidle_state *state = &drv->states[index];
        unsigned long eax = flg2MWAIT(state->flags);
        unsigned int cstate;
        int cpu = smp_processor_id();

#if (defined(CONFIG_REMOVEME_INTEL_ATOM_MRFLD_POWER) && \
        defined(CONFIG_PM_DEBUG))
        {
                /* Get Cstate based on ignore table from PMU driver */
                unsigned int ncstate;
                cstate =
                (((eax) >> MWAIT_SUBSTATE_SIZE) & MWAIT_CSTATE_MASK) + 1;
                ncstate = pmu_get_new_cstate(cstate, &index);
                eax     = flg2MWAIT(drv->states[index].flags);
        }
#endif
        cstate = (((eax) >> MWAIT_SUBSTATE_SIZE) & MWAIT_CSTATE_MASK) + 1;

        /*
         * leave_mm() to avoid costly and often unnecessary wakeups
         * for flushing the user TLB's associated with the active mm.
         */
        if (state->flags & CPUIDLE_FLAG_TLB_FLUSHED)
                leave_mm(cpu);

        if (!(lapic_timer_reliable_states & (1 << (cstate))))
                clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);

        if (!need_resched()) {

                __monitor((void *)&current_thread_info()->flags, 0, 0);
                smp_mb();
                if (!need_resched())
                        __mwait(eax, ecx);
#if defined(CONFIG_REMOVEME_INTEL_ATOM_MDFLD_POWER) || \
        defined(CONFIG_REMOVEME_INTEL_ATOM_CLV_POWER)
                if (!need_resched() && is_irq_pending() == 0)
                        __get_cpu_var(update_buckets) = 0;
#endif
        }

        if (!(lapic_timer_reliable_states & (1 << (cstate))))
                clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);

        return index;
}

Intel_mid: Android Power Management Flow for Suspend/resume Using S0i3 Implementation

Dec 4th, 2013

The main PMU driver (arch/x86/platform/intel-mid/intel_soc_pmu.c) hooks to support Linux PM suspend/resume flows as follows.
The S0ix states are low-power active idle states that platform can be transitioned into. – Register PMU driver as PCI device
The PMU driver registers mid_suspend_ops via suspend_set_ops().

mid_pci_register_init

/**
 * mid_pci_register_init - register the PMU driver as PCI device
 */
static struct pci_driver driver = {
        .name = PMU_DRV_NAME,
        .id_table = mid_pm_ids,
        .probe = mid_pmu_probe,
        .remove = mid_pmu_remove,
        .shutdown = mid_pmu_shutdown
};

static int __init mid_pci_register_init(void)
{
        int ret;

        mid_pmu_cxt = kzalloc(sizeof(struct mid_pmu_dev), GFP_KERNEL);

        if (mid_pmu_cxt == NULL)
                return -ENOMEM;

        mid_pmu_cxt->s3_restrict_qos =
                kzalloc(sizeof(struct pm_qos_request), GFP_KERNEL);
        if (mid_pmu_cxt->s3_restrict_qos) {
                pm_qos_add_request(mid_pmu_cxt->s3_restrict_qos,
                         PM_QOS_CPU_DMA_LATENCY, PM_QOS_DEFAULT_VALUE);
        } else {
                return -ENOMEM;
        }

        init_nc_device_states();

        mid_pmu_cxt->nc_restrict_qos =
                kzalloc(sizeof(struct pm_qos_request), GFP_KERNEL);
        if (mid_pmu_cxt->nc_restrict_qos == NULL)
                return -ENOMEM;

        /* initialize the semaphores */
        sema_init(&mid_pmu_cxt->scu_ready_sem, 1);

        /* registering PCI device */
        ret = pci_register_driver(&driver);
        suspend_set_ops(&mid_suspend_ops);

        return ret;
}

mid_suspend_enter() performs the required S3 state over standby_enter()
check_nc_sc_status() hooked by mrfld_nc_sc_status_check (defind at arch/x86/platform/intel-mid/intel_soc_mrfld.c) is to check north complex (NC) and soutch complex (SC) device status. Return true if all NC and SC devices are in D0i3.

mid_suspend_enter()

static const struct platform_suspend_ops mid_suspend_ops = {
        .begin = mid_suspend_begin,
        .valid = mid_suspend_valid,
        .prepare = mid_suspend_prepare,
        .prepare_late = mid_suspend_prepare_late,
        .enter = mid_suspend_enter,
        .end = mid_suspend_end,
};

static int mid_suspend_enter(suspend_state_t state)
{
        int ret;

        if (state != PM_SUSPEND_MEM)
                return -EINVAL;

        /* one last check before entering standby */
        if (pmu_ops->check_nc_sc_status) {
                if (!(pmu_ops->check_nc_sc_status())) {
                        trace_printk("Device d0ix status check failed! Aborting Standby entry!\n");
                        WARN_ON(1);
                }
        }

        trace_printk("s3_entry\n");
        ret = standby_enter();
        trace_printk("s3_exit %d\n", ret);
        if (ret != 0)
                dev_dbg(&mid_pmu_cxt->pmu_dev->dev,
                                "Failed to enter S3 status: %d\n", ret);

        return ret;
}

standby_enter() performs requried S3 state
- mid_state_to_sys_state() maps power states to driver’s internal indexes.
- mid_s01x_enter() performs required S3 state
- __mwait(mid_pmu_cxt->s3_hint, 1) is issued with a hint to enter S0i3(S3 emulation using S0i3, as I observed that MRFLD_S3_HINT with 0x64 is same as MID_S0I3_STATE with 0x64). When both core issue an mwait C7, it is a hint provided by the idle driver to enter an S0ix state.
- This triggers S0i3 entry, but the decision and policy is selected by SCU FW.

standby_enter()

static int standby_enter(void)
{
        u32 temp = 0;
        int s3_state = mid_state_to_sys_state(MID_S3_STATE);

        if (mid_s0ix_enter(MID_S3_STATE) != MID_S3_STATE) {
                pmu_set_s0ix_complete();
                return -EINVAL;
        }

        /* time stamp for end of s3 entry */
        time_stamp_for_sleep_state_latency(s3_state, false, true);

        __monitor((void *) &temp, 0, 0);
        smp_mb();
        __mwait(mid_pmu_cxt->s3_hint, 1);

        /* time stamp for start of s3 exit */
        time_stamp_for_sleep_state_latency(s3_state, true, false);

        pmu_set_s0ix_complete();

        /*set wkc to appropriate value suitable for s0ix*/
        writel(mid_pmu_cxt->ss_config->wake_state.wake_enable[0],
                       &mid_pmu_cxt->pmu_reg->pm_wkc[0]);
        writel(mid_pmu_cxt->ss_config->wake_state.wake_enable[1],
                       &mid_pmu_cxt->pmu_reg->pm_wkc[1]);

        mid_pmu_cxt->camera_off = 0;
        mid_pmu_cxt->display_off = 0;

        if (platform_is(INTEL_ATOM_MRFLD))
                up(&mid_pmu_cxt->scu_ready_sem);

        return 0;
}

mid_s01x_enter()
- pmu_prepare_wake() will mask wakeup from AONT timers for s3. If s3 is aborted for any reason, we don’t want to leave AONT timers masked until next suspend, otherwise if next to happen is s0ix, no timer could wakeup SoC from s0ix and we might miss to kick the kernel watchdog.
- enter() hooked by mrfld_pmu_enter (defined at arch/x86/platform/intel-mid/intel_soc_mrfld.c). Compared to Medfield and Covertail platform, PM_CMD is not required to send to SCU.

mid_s01x_enter()

int mid_s0ix_enter(int s0ix_state)
{
        int ret = 0;

        if (unlikely(!pmu_ops || !pmu_ops->enter))
                goto ret;

        /* check if we can acquire scu_ready_sem
         * if we are not able to then do a c6 */
        if (down_trylock(&mid_pmu_cxt->scu_ready_sem))
                goto ret;

        /* If PMU is busy, we'll retry on next C6 */
        if (unlikely(_pmu_read_status(PMU_BUSY_STATUS))) {
                up(&mid_pmu_cxt->scu_ready_sem);
                pr_debug("mid_pmu_cxt->scu_read_sem is up\n");
                goto ret;
        }

        pmu_prepare_wake(s0ix_state);

        /* no need to proceed if schedule pending */
        if (unlikely(need_resched())) {
                pmu_stat_clear();
                up(&mid_pmu_cxt->scu_ready_sem);
                goto ret;
        }

        /* entry function for pmu driver ops */
        if (pmu_ops->enter(s0ix_state))
                ret = s0ix_state;

ret:
        return ret;
}

Intel_mid: Introduction of Android Watchdogs and Kernel Watchdogs

Dec 2nd, 2013

Watchdogs monitor software and hardware device and prevent whole system from hanging. After looking into android BSP running on intel mid, merrifield, platform, I will try to classfiy watchdog into the following types. The first two belong to native Android supporting, and the last three are specified to intel-mid platform.

1. Android framework’s Java* watchdog

Deal with cases when any of the following locks is held for more than a minute or when ServerThread is busy.

— ThermalManagerService
— PowerManagerService
— WindowMangerService
— MountService
— NetworkManagementService
— ActivityMangerService

If one of above services hangs for one minute, the java watchdog kills it and results in restarting android’s framework by killing the SystemServer.

2. Device-critical services

Critical services are declared as “critical” in the corresponding rc files (eg, ueventd, servicemanager). If critical services exist or crash more than four times in four minutes, the device will reboot into recovery mode. This feature is handled by the init process.

3. Kernel watchdog leads to COLD_RESET

The kernel watchdog prvents the operating system from hanging. The System Control Unit (SCU firmware) resets the platform when the kernel cannot schedule the watchdog daemon (/usr/bin/ia_watchdogd).

The driver located at /drivers/watchdog/intel_scu_watchdog_evo.c provides a /dev/watchdog device to access the kernel watchdog and ioctls to configure the timer. Since the SCU provides the functionality, all access to watchdog features are routed to the SCU via an IPC (see more PIC regitrations at arch/x86/platform/intel-mid/intel_mid_scu.c)

Init code installs the driver

watchdog_rpmsg_init()

static struct rpmsg_driver watchdog_rpmsg = {
        .drv.name       = KBUILD_MODNAME,
        .drv.owner      = THIS_MODULE,
        .id_table       = watchdog_rpmsg_id_table,
        .probe          = watchdog_rpmsg_probe,
        .callback       = watchdog_rpmsg_cb,
        .remove         = watchdog_rpmsg_remove,
};

static int __init watchdog_rpmsg_init(void)
{
        if (intel_mid_identify_cpu() == INTEL_MID_CPU_CHIP_TANGIER)
                return register_rpmsg_driver(&watchdog_rpmsg);
        else {
                pr_err("%s: watchdog driver: bad platform\n", __func__);
                return -ENODEV;
        }
}

#ifdef MODULE
module_init(watchdog_rpmsg_init);
#else
rootfs_initcall(watchdog_rpmsg_init);
#endif

create the /dev/watchdog only if the disabled_kernel_watchdog module parameter is not set. It gets the timer’s configuration, registers reboot notifier, registers dump handler to irq#15, and adds sysfs/debugfs entries.

watchdog_rpmsg_probe()->intel_scu_watchdog_init()

/* Init code */
static int intel_scu_watchdog_init(void)
{
        int ret = 0;

        watchdog_device.normal_wd_action   = SCU_COLD_RESET_ON_TIMEOUT;
        watchdog_device.reboot_wd_action   = SCU_COLD_RESET_ON_TIMEOUT;
        watchdog_device.shutdown_wd_action = SCU_COLD_OFF_ON_TIMEOUT;

#ifdef CONFIG_DEBUG_FS
        watchdog_device.panic_reboot_notifier = false;
#endif /* CONFIG_DEBUG_FS */

        /* Initially, we are not in shutdown mode */
        watchdog_device.shutdown_flag = false;

        /* Check timeouts boot parameter */
        if (check_timeouts(pre_timeout, timeout)) {
                pr_err("%s: Invalid timeouts\n", __func__);
                return -EINVAL;
        }

        /* Reboot notifier */
        watchdog_device.reboot_notifier.notifier_call = reboot_notifier;
        watchdog_device.reboot_notifier.priority = 1;
        ret = register_reboot_notifier(&watchdog_device.reboot_notifier);
        if (ret) {
                pr_crit("cannot register reboot notifier %d\n", ret);
                goto error_stop_timer;
        }

        /* Do not publish the watchdog device when disable (TO BE REMOVED) */
        if (!disable_kernel_watchdog) {
                watchdog_device.miscdev.minor = WATCHDOG_MINOR;
                watchdog_device.miscdev.name = "watchdog";
                watchdog_device.miscdev.fops = &intel_scu_fops;

                ret = misc_register(&watchdog_device.miscdev);
                watchdog_device.miscdev.fops = &intel_scu_fops;

                ret = misc_register(&watchdog_device.miscdev);
                if (ret) {
                        pr_crit("Cannot register miscdev %d err =%d\n",
                                WATCHDOG_MINOR, ret);
                        goto error_reboot_notifier;
                }
        }

        /* MSI #15 handler to dump registers */
        handle_mrfl_dev_ioapic(EXT_TIMER0_MSI);
        ret = request_irq((unsigned int)EXT_TIMER0_MSI,
                watchdog_warning_interrupt,
                IRQF_SHARED|IRQF_NO_SUSPEND, "watchdog",
                &watchdog_device);
        if (ret) {
                pr_err("error requesting warning irq %d\n",
                       EXT_TIMER0_MSI);
                pr_err("error value returned is %d\n", ret);
                goto error_misc_register;
        }

#ifdef CONFIG_INTEL_SCU_SOFT_LOCKUP
        init_timer(&softlock_timer);
#endif

        if (disable_kernel_watchdog) {
                pr_err("%s: Disable kernel watchdog\n", __func__);

                /* Make sure timer is stopped */
                ret = watchdog_stop();
                if (ret != 0)
                        pr_debug("cant disable timer\n");
        }

#ifdef CONFIG_DEBUG_FS
        ret = create_debugfs_entries();
        if (ret) {
                pr_err("%s: Error creating debugfs entries\n", __func__);
                goto error_debugfs_entry;
        }
#endif

        watchdog_device.started = false;

        ret = create_watchdog_sysfs_files();
        if (ret) {
                pr_err("%s: Error creating debugfs entries\n", __func__);
                goto error_sysfs_entry;
        }

        return ret;

error_sysfs_entry:
        /* Nothing special to do */
#ifdef CONFIG_DEBUG_FS
error_debugfs_entry:
        /* Remove entries done by create function */
#endif

error_misc_register:
        misc_deregister(&watchdog_device.miscdev);

error_reboot_notifier:
        unregister_reboot_notifier(&watchdog_device.reboot_notifier);

error_stop_timer:
        watchdog_stop();

        return ret;
}

interrupt handler related to pre-timeout dumps kernel backtraces.

watchdog_warning_interrupt

/* warning interrupt handler */
static irqreturn_t watchdog_warning_interrupt(int irq, void *dev_id)
{
        pr_warn("[SHTDWN] %s, WATCHDOG TIMEOUT!\n", __func__);

        /* Let's reset the platform after dumping some data */
        trigger_all_cpu_backtrace();
        panic("Kernel Watchdog");

        /* This code should not be reached */
        return IRQ_HANDLED;
}

When power transisition happens, reboot_notifier is called for re-configuring watchdog timeouts and its default behavior. COLD_RESET is set to reboot, and COLD_OFF is set to poewr halt and off. In case of a stucking rebooting or shutdown procedure, the platform will still could execute reset or power-off seperately.

reboot_notifier

/* Reboot notifier */
static int reboot_notifier(struct notifier_block *this,
                           unsigned long code,
                           void *another_unused)
{
        int ret;

        if (code == SYS_RESTART || code == SYS_HALT || code == SYS_POWER_OFF) {
                pr_warn("Reboot notifier\n");

                if (watchdog_set_appropriate_timeouts())
                        pr_crit("reboot notifier cant set time\n");

                switch (code) {
                case SYS_RESTART:
                        ret = watchdog_set_reset_type(
                                watchdog_device.reboot_wd_action);
                        break;

                case SYS_HALT:
                case SYS_POWER_OFF:
                        ret = watchdog_set_reset_type(
                                watchdog_device.shutdown_wd_action);
                        break;
                }
                if (ret)
                        pr_err("%s: could not set reset type\n", __func__);

#ifdef CONFIG_DEBUG_FS
                /* debugfs entry to generate a BUG during
                any shutdown/reboot call */
                if (watchdog_device.panic_reboot_notifier)
                        BUG();
#endif
                /* Don't do instant reset on close */
                reset_on_release = false;

                /* Kick once again */
                if (disable_kernel_watchdog == false) {
                        ret = watchdog_keepalive();
                        if (ret)
                                pr_warn("%s: no keep alive\n", __func__);

                        /* Don't allow any more keep-alives */
                        watchdog_device.shutdown_flag = true;
                }
        }
        return NOTIFY_DONE;
}

4. Userspace watchdog daemon

Source codes are located at /hardware/ia_watchdog/watchdog_daemon folder) and target location is /usr/bin/ia_watchdogd. This daemon is declared as one-shot service in the rc file (init.watchdog.rc) and perform the following steps:

— open the watchdog device /dev/watchdog.
— configure the pre_timeout with 75 seconds and timeout with 90 seconds.
— Loop forever. In the loop, kick the watchdog device (by writing to ‘R’ to /dev/watchdog) every 60 seconds.

5. SCU watchdog leads to PLATFORM_RESET(deep reset)

This prevents the platform stucking on SCU by issuing a PLATFORM_RESET because the interface between the SCU and PMIC is broken.

Intel_mid: Overview of Simple Platform Interface

Nov 27th, 2013

Start to develop android kernel under X86 MID platform, merrifield. I am a newbie for porting kernel to Atom-based platform but am familiar with linux kernel and device drivers. Noticed that Intel Android BSP introudes Simple Firmware Interface (SFI) a method for platform firmware to export static tables (I2C, SPI, GPIO) to the operation system.

Intel’s newer Atom processors support SFI since “Moorestown” SoC and SFI implementation was merged into upstream kernel 2.6.32(http://lwn.net/Articles/340476)

Actually, below link descrbies the SFI and explains how does SFI related to ACPI and UEFI. https://simplefirmware.org/faq

Besides, below patch sets are to refactor existing code and implement a flexible way to support multiple boards and devices. https://lkml.org/lkml/2013/10/10/81

/arch/x86/platform/intel-mid/intel_mid_sfi.c is SFI parsing implementation, and let me understand how get_gpio_by_name() works.

Android Ram-console Upstreaming

Nov 8th, 2013

While dealing with board bring-up powered by Tegra5, starts to aware of this upstream change for ram console (aka /proc/last_kmsg). A working group, Android Upstreaming, comes from Linaro foundation to merge ram_console into pstore framework ( http://lwn.net/Articles/497881/).

The Android Upstreaming team’s mission is to reduce and eventually eliminate the differences between the upstream kernel and the Android kernel. The team works closely with Google and upstream kernel developers to find ways to implement Android required features in a way that meets the need of both communities.

There are currently two competing debug facilities to store kernel messages in a persistent storage: a generic pstore and Google’s persistent_ram by Colin Cross. Not so long ago (https://lkml.org/lkml/2012/3/8/252) noticed by Greg KH@ARM Linux, it was decided to fix this situation. There is a buleprint registered by Linaro Linux to descrbie those debug facilities at https://blueprints.launchpad.net/linux-linaro/+spec/android-ram-console

To follow up android upstreaming’s works, I remove legacy driver supporting for ram console and persistent ram and switch new pstore framework. In other words, start to looking into /sys/fs/pstore/console-ramoops as we used to did analysis on /proc/last_kmsg for dying moment across system reboot.

My works were merged into Asus internal development branch jb-mr2-t50-k3.10, but not opened yet. We could refer to https://android.googlesource.com/kernel/tegra/+/android-tegra-3.10 for further reference.

Blog Archives

Paris Blog

A blog or note for software development