1.lmk和oom adj是什么

(本文代码基于Android14。)

Android系统在内存不足的时候,会通过限制或终止不必要的进程等方式来释放内存,使系统以可接受的性能水平运行,负责在系统内存不足的时候终止进程的就是lmk。

lmk全称是低内存杀手(Low Memory Killer),Android早期是使用内核中实现的低内存终止守护程序(LMK)驱动程序来监控系统内存压力,从内核4.12开始,LMK驱动程序已从上游内核中移除,改由用户空间lmkd来执行内存监控和进程终止任务。

oom adj本文中指的是oom_score_adj(Out-of-Memory Score Adjustments),可以看做是代表进程优先级的一个量,每个进程都有一个oom_score_adj,并且这个值是动态变化的,lmk在系统达到一定内存压力的时候,参考oom_score_adj去选取需要终止的进程。

2.oom adj

2.1 如何查看进程的adj值

# 1. adb shell
# 2. cd /proc/<pid> 其中<pid>为某个进程的进程号
# 3. ls 查看/proc/<pid> 目录下的文件可以看到以下三个文件
oom_adj         oom_score       oom_score_adj
# 4. 用cat命令可以查看对应的值
cat oom_score_adj
-1000

oom_adj、oom_score和oom_score_adj分别代表什么?有什么区别呢?

  • oom_adj 是一个比较旧的用于表示进程优先级的参数。它的取值范围是从-17到15。oom_adj在较新的Linux内核版本中逐渐被oom_score_adj取代,因为oom_adj的取值范围有限,不能很好地满足现代复杂系统中对进程OOM优先级的精细调整需求。

  • oom_score 是一个反映进程在当前系统状态下被OOM Killer选中可能性的一个综合得分。这个得分是由系统根据多个因素计算得出的,包括进程的内存使用量、进程的优先级、是否是后台进程等诸多因素。oom_score越大,进程越容易被杀。

  • oom_score_adj 是用于调整进程的oom_score的一个参数。它是一个整数,取值范围是-1000到1000。oom_score_adj越大,进程越容易被杀。

2.2 adj有哪些值

oom_score_adj的值范围从-1000~1000,其中一些预定义的值定义在ProcessList中,代表的意义注释也写的很清楚,这里简单翻译一下。

// release/frameworks/base/services/core/java/com/android/server/am/ProcessList.java

    // OOM adjustments for processes in various states:

    // Uninitialized value for any major or minor adj fields
    public static final int INVALID_ADJ = -10000;

    // Adjustment used in certain places where we don't know it yet.
    // (Generally this is something that is going to be cached, but we
    // don't know the exact value in the cached range to assign yet.)
    public static final int UNKNOWN_ADJ = 1001;

    // 这是一个仅承载不可见活动的进程,
    // 所以可以将其终止,且不会造成任何干扰。
    public static final int CACHED_APP_MAX_ADJ = 999;
    public static final int CACHED_APP_MIN_ADJ = 900;

    // 这是我们允许首先被终止的内存溢出调整(oom_adj)级别。除非进程被主动分配了一个
    // 为 CACHED_APP_MAX_ADJ 的内存溢出分数调整(oom_score_adj)值,否则这个值不能等同于 CACHED_APP_MAX_ADJ。
    public static final int CACHED_APP_LMK_FIRST_ADJ = 950;

    // Number of levels we have available for different service connection group importance
    // levels.
    static final int CACHED_APP_IMPORTANCE_LEVELS = 5;

    // SERVICE_ADJ的 B 列表 —— 这些是陈旧且较落后的服务,不像 A 列表中的那些服务那样重要和有吸引力。
    public static final int SERVICE_B_ADJ = 800;

    // 这是用户之前所在应用的进程。
    // 这个进程的优先级要高于其他一些进程,因为用户经常会切换回之前使用的应用。这对于近期的任务切换(在两个最近使用的顶部应用之间切换)以及常规的用户界面操作流程(比如在电子邮件应用中点击一个链接在浏览器中查看,然后按返回键回到电子邮件应用)来说都很重要。
    public static final int PREVIOUS_APP_ADJ = 700;

    // 这是一个承载主屏幕应用的进程 —— 我们要尽量避免终止它,即便它通常处于后台,
    // 因为用户与它的交互非常频繁。
    public static final int HOME_APP_ADJ = 600;

    // 这是一个承载应用服务的进程 —— 就用户而言,终止它不会产生太大影响。
    public static final int SERVICE_ADJ = 500;

    // 这是一个具有重量级应用的进程。它处于后台,但我们要尽量避免终止它。其值在启动时于 system/rootdir/init.rc 中设置。
    public static final int HEAVY_WEIGHT_APP_ADJ = 400;

    // 这是一个当前正在进行备份操作的进程。终止它
    // 并非完全致命,但通常来说不是一个好主意。
    public static final int BACKUP_APP_ADJ = 300;

    // 这是一个受系统(或其他应用程序)约束的进程,它比服务进程更重要,但
    // 如果被终止,其影响也并非那么明显,不会立即对用户产生影响。
    public static final int PERCEPTIBLE_LOW_APP_ADJ = 250;

    // 这是一个承载服务的进程,这些服务对用户来说是不可感知的,但
    // 与之绑定的客户端(系统)请求将其视为可感知的,并尽可能避免终止它。
    public static final int PERCEPTIBLE_MEDIUM_APP_ADJ = 225;

    // 这是一个仅承载对用户来说可感知组件的进程,我们非常希望避免终止它们,但它们并非
    // 立即就能被用户看到。例如后台音乐播放就是这样的情况。
    public static final int PERCEPTIBLE_APP_ADJ = 200;

    // 这是一个仅承载对用户可见活动的进程,所以我们希望它们不会消失。
    public static final int VISIBLE_APP_ADJ = 100;
    static final int VISIBLE_APP_LAYER_MAX = PERCEPTIBLE_APP_ADJ - VISIBLE_APP_ADJ - 1;

    // 这是一个最近处于前台(TOP)状态,而后转移到前台服务(FGS)的进程。在一段时间内,仍要将其几乎当作前台应用来对待。
    public static final int PERCEPTIBLE_RECENT_FOREGROUND_APP_ADJ = 50;

    // 这是运行当前前台应用的进程。我们实在是非常不希望终止它!
    public static final int FOREGROUND_APP_ADJ = 0;

    // 这是一个被系统或持久化进程绑定的进程,并且已表明它很重要。
    public static final int PERSISTENT_SERVICE_ADJ = -700;

    // 这是一个系统持久化进程,比如电话(通讯相关)进程。绝对
    // 不想终止它,不过即便终止了它,也并非是完全致命的情况。
    public static final int PERSISTENT_PROC_ADJ = -800;

    // 系统进程以默认的调整值运行。
    public static final int SYSTEM_ADJ = -900;

    // 针对那些不由系统管理的原生进程(因此系统不会为其分配内存溢出调整值(oom adj))的特殊代码。
    public static final int NATIVE_ADJ = -1000;

3.系统如何更新进程adj值

3.1 ProcessList

ProcessList主要负责和lmkd进程通信,比如建立socket连接、提供setOomAdj方法更新进程adj值等。当然还包括进程相关的一些其他方法,比如启动进程、lru进程更新等等,这里主要分析和lowmemorykiller相关的。

// release/frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java

    public ActivityManagerService(Context systemContext, ActivityTaskManagerService atm) {
        ......
        mProcessList = mInjector.getProcessList(this);
        mProcessList.init(this, activeUids, mPlatformCompat);
        mAppProfiler = new AppProfiler(this, BackgroundThread.getHandler().getLooper(),
                    new LowMemDetector(this));
        mPhantomProcessList = new PhantomProcessList(this);
        mOomAdjuster = new OomAdjuster(this, mProcessList, activeUids);
        ......
    }

ProcessList和OomAdjuster在ActivityManagerService的构造方法里面被初始化

// release/frameworks/base/services/core/java/com/android/server/am/ProcessList.java

    ProcessList() {
        MemInfoReader minfo = new MemInfoReader();
        minfo.readMemInfo();
        mTotalMemMb = minfo.getTotalSize()/(1024*1024);
        updateOomLevels(0, 0, false);
    }

ProcessList的构造方法里面调用了updateOomLevels方法,updateOomLevels方法主要是更新ProcessList的成员变量mOomMinFree数组的值。先来看下ProcessList里面的4个数组:

// release/frameworks/base/services/core/java/com/android/server/am/ProcessList.java

    // These are the various interesting memory levels that we will give to
    // the OOM killer.  Note that the OOM killer only supports 6 slots, so we
    // can't give it a different value for every possible kind of process.
    private final int[] mOomAdj = new int[] {
            FOREGROUND_APP_ADJ, VISIBLE_APP_ADJ, PERCEPTIBLE_APP_ADJ,
            PERCEPTIBLE_LOW_APP_ADJ, CACHED_APP_MIN_ADJ, CACHED_APP_LMK_FIRST_ADJ
    };
    // These are the low-end OOM level limits.  This is appropriate for an
    // HVGA or smaller phone with less than 512MB.  Values are in KB.
    private final int[] mOomMinFreeLow = new int[] {
            12288, 18432, 24576,
            36864, 43008, 49152
    };
    // These are the high-end OOM level limits.  This is appropriate for a
    // 1280x800 or larger screen with around 1GB RAM.  Values are in KB.
    private final int[] mOomMinFreeHigh = new int[] {
            73728, 92160, 110592,
            129024, 147456, 184320
    };
    // The actual OOM killer memory levels we are using.
    private final int[] mOomMinFree = new int[mOomAdj.length];

mOomMinFree和mOomAdj才是最终起作用的,mOomMinFreeLow和mOomMinFreeHigh只是针对低于512MB运存设备和1GB左右运存设备的参考值,mOomMinFree的计算会用到这两个数组的值。mOomMinFree和mOomAdj数组的元素是一一对应的,分别代表剩余内存大小和adj值。

    private void updateOomLevels(int displayWidth, int displayHeight, boolean write) {
        ......

        if (write) {
            ByteBuffer buf = ByteBuffer.allocate(4 * (2 * mOomAdj.length + 1));
            buf.putInt(LMK_TARGET);
            for (int i = 0; i < mOomAdj.length; i++) {
                buf.putInt((mOomMinFree[i] * 1024)/PAGE_SIZE);
                buf.putInt(mOomAdj[i]);
            }

            writeLmkd(buf, null);
            ......
        }
    }

updateOomLevels方法里的具体算法这里就不看了,主要影响因素有:系统总运存大小、屏幕大小、mOomMinFreeLow、mOomMinFreeHigh等。

updateOomLevels方法除了ProcessList的构造方法里调用,还有一个applyDisplaySize方法里也调用了,在这个方法里调用的时候传的参数write为true,最终会调用writeLmkd通过socket将mOomMinFree和mOomAdj传递到lmkd进程。

// release/system/memory/lmkd/lmkd.cpp

static void cmd_target(int ntargets, LMKD_CTRL_PACKET packet) {
    ......

    for (i = 0; i < ntargets; i++) {
        lmkd_pack_get_target(packet, i, &target);
        lowmem_minfree[i] = target.minfree;
        lowmem_adj[i] = target.oom_adj_score;
        ......
    }

    lowmem_targets_size = ntargets;

    ......

    // 写入属性中
    property_set("sys.lmk.minfree_levels", minfree_str);

    if (has_inkernel_module) {
        char minfreestr[128];
        char killpriostr[128];

        minfreestr[0] = '\0';
        killpriostr[0] = '\0';

        for (i = 0; i < lowmem_targets_size; i++) {
            char val[40];

            if (i) {
                strlcat(minfreestr, ",", sizeof(minfreestr));
                strlcat(killpriostr, ",", sizeof(killpriostr));
            }

            snprintf(val, sizeof(val), "%d", use_inkernel_interface ? lowmem_minfree[i] : 0);
            strlcat(minfreestr, val, sizeof(minfreestr));
            snprintf(val, sizeof(val), "%d", use_inkernel_interface ? lowmem_adj[i] : 0);
            strlcat(killpriostr, val, sizeof(killpriostr));
        }

        // 写入到/sys/module/lowmemorykiller/parameters/minfree
        writefilestring(INKERNEL_MINFREE_PATH, minfreestr, true);
        // 写入到/sys/module/lowmemorykiller/parameters/adj
        writefilestring(INKERNEL_ADJ_PATH, killpriostr, true);
    }
}

由上面的函数可以知道,ProcessList里面mOomMinFree和mOomAdj两个数组的内容最终是写入到节点/sys/module/lowmemorykiller/parameters/minfree和/sys/module/lowmemorykiller/parameters/adj,但是由于在Android 14中,has_inkernel_module为false,所以并没有写入到上面两个节点中,并且在lmkd中use_minfree_levels默认为false,因此mOomMinFree和mOomAdj默认并不起作用。

3.2 ProcessList与lmkd进程建立socket连接

// release/frameworks/base/services/core/java/com/android/server/am/ProcessList.java

    void init(ActivityManagerService service, ActiveUids activeUids,
            PlatformCompat platformCompat) {
        ......

        if (sKillHandler == null) {
            sKillThread = new ServiceThread(TAG + ":kill",
                    THREAD_PRIORITY_BACKGROUND, true /* allowIo */);
            sKillThread.start();
            sKillHandler = new KillHandler(sKillThread.getLooper());
            sLmkdConnection = new LmkdConnection(sKillThread.getLooper().getQueue(),
                    new LmkdConnection.LmkdConnectionListener() {......}
            );
            ......
        }
    }

ProcessList的init方法里面主要是初始化KillHandler和LmkdConnection,在第一次调用writeLmkd的时候,会发送LMKD_RECONNECT_MSG消息到KillHandler,然后调用LmkdConnection的connect方法建立与lmkd的socket连接。

// release/frameworks/base/services/core/java/com/android/server/am/LmkdConnection.java

    ......
    public boolean connect() {
        synchronized (mLmkdSocketLock) {
            ......
            // temporary sockets and I/O streams
            final LocalSocket socket = openSocket();

            ......
        }
        return true;
    }
    ......

    private LocalSocket openSocket() {
        final LocalSocket socket;

        try {
            socket = new LocalSocket(LocalSocket.SOCKET_SEQPACKET);
            socket.connect(
                new LocalSocketAddress("lmkd",
                        LocalSocketAddress.Namespace.RESERVED));
        } catch (IOException ex) {
            Slog.e(TAG, "Connection failed: " + ex.toString());
            return null;
        }
        return socket;
    }

3.3 ProcessList与lmkd进程通信

ProcessList通过writeLmkd方法将命令和参数写入socket

// release/frameworks/base/services/core/java/com/android/server/am/ProcessList.java

    private static boolean writeLmkd(ByteBuffer buf, ByteBuffer repl) {
        if (!sLmkdConnection.isConnected()) {
            // try to connect immediately and then keep retrying
            sKillHandler.sendMessage(
                    sKillHandler.obtainMessage(KillHandler.LMKD_RECONNECT_MSG));

            // wait for connection retrying 3 times (up to 3 seconds)
            if (!sLmkdConnection.waitForConnection(3 * LMKD_RECONNECT_DELAY_MS)) {
                return false;
            }
        }

        return sLmkdConnection.exchange(buf, repl);
    }
// release/frameworks/base/services/core/java/com/android/server/am/LmkdConnection.java

    public boolean exchange(ByteBuffer req, ByteBuffer repl) {
        if (repl == null) {
            return write(req);
        }

        boolean result = false;
        // set reply buffer to user-defined one to fill it
        synchronized (mReplyBufLock) {
            mReplyBuf = repl;

            if (write(req)) {
                try {
                    // wait for the reply
                    mReplyBufLock.wait();
                    result = (mReplyBuf != null);
                } catch (InterruptedException ie) {
                    result = false;
                }
            }

            // reset reply buffer
            mReplyBuf = null;
        }
        return result;
    }
    ......

    private boolean write(ByteBuffer buf) {
        synchronized (mLmkdSocketLock) {
            ......
            mLmkdOutputStream.write(buf.array(), 0, buf.position());
            ......
        }
    }

ProcessList和lmkd通信支持的命令定义在ProcessList中,这些命令和lmkd中定义的命令值一一对应:

// release/frameworks/base/services/core/java/com/android/server/am/ProcessList.java

    // Low Memory Killer Daemon command codes.
    // These must be kept in sync with lmk_cmd definitions in lmkd.h
    //
    // LMK_TARGET <minfree> <minkillprio> ... (up to 6 pairs)
    // LMK_PROCPRIO <pid> <uid> <prio>
    // LMK_PROCREMOVE <pid>
    // LMK_PROCPURGE
    // LMK_GETKILLCNT
    // LMK_SUBSCRIBE
    // LMK_PROCKILL
    // LMK_UPDATE_PROPS
    // LMK_KILL_OCCURRED
    // LMK_STATE_CHANGED
    static final byte LMK_TARGET = 0;
    static final byte LMK_PROCPRIO = 1;
    static final byte LMK_PROCREMOVE = 2;
    static final byte LMK_PROCPURGE = 3;
    static final byte LMK_GETKILLCNT = 4;
    static final byte LMK_SUBSCRIBE = 5;
    static final byte LMK_PROCKILL = 6; // Note: this is an unsolicited command
    static final byte LMK_UPDATE_PROPS = 7;
    static final byte LMK_KILL_OCCURRED = 8; // Msg to subscribed clients on kill occurred event
    static final byte LMK_STATE_CHANGED = 9; // Msg to subscribed clients on state changed
// release/system/memory/lmkd/include/lmkd.h

/*
 * Supported LMKD commands
 */
enum lmk_cmd {
    LMK_TARGET = 0,         /* Associate minfree with oom_adj_score */
    LMK_PROCPRIO,           /* Register a process and set its oom_adj_score */
    LMK_PROCREMOVE,         /* Unregister a process */
    LMK_PROCPURGE,          /* Purge all registered processes */
    LMK_GETKILLCNT,         /* Get number of kills */
    LMK_SUBSCRIBE,          /* Subscribe for asynchronous events */
    LMK_PROCKILL,           /* Unsolicited msg to subscribed clients on proc kills */
    LMK_UPDATE_PROPS,       /* Reinit properties */
    LMK_STAT_KILL_OCCURRED, /* Unsolicited msg to subscribed clients on proc kills for statsd log */
    LMK_STAT_STATE_CHANGED, /* Unsolicited msg to subscribed clients on state changed */
};

3.4 更新进程adj值

ActivityManagerService会根据进程的不同状态去计算进程的adj值,主要是通过AMS里面的两个方法:

// release/frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java

    @GuardedBy("this")
    final void updateOomAdjLocked(@OomAdjReason int oomAdjReason) {
        mOomAdjuster.updateOomAdjLocked(oomAdjReason);
    }

    /**
     * Update OomAdj for a specific process and its reachable processes.
     *
     * @param app The process to update
     * @param oomAdjReason
     * @return whether updateOomAdjLocked(app) was successful.
     */
    @GuardedBy("this")
    final boolean updateOomAdjLocked(ProcessRecord app, @OomAdjReason int oomAdjReason) {
        return mOomAdjuster.updateOomAdjLocked(app, oomAdjReason);
    }

一个是更新所有进程的adj值,一个是更新某个进程的adj值。

什么情况下会触发更新进程的adj值呢?ActivityManagerInternal里面列出了原因:

// release/frameworks/base/services/core/java/com/android/server/am/ActivityManagerInternal.java

    @IntDef(prefix = {"OOM_ADJ_REASON_"}, value = {
        OOM_ADJ_REASON_NONE,
        OOM_ADJ_REASON_ACTIVITY,
        OOM_ADJ_REASON_FINISH_RECEIVER,
        OOM_ADJ_REASON_START_RECEIVER,
        OOM_ADJ_REASON_BIND_SERVICE,
        OOM_ADJ_REASON_UNBIND_SERVICE,
        OOM_ADJ_REASON_START_SERVICE,
        OOM_ADJ_REASON_GET_PROVIDER,
        OOM_ADJ_REASON_REMOVE_PROVIDER,
        OOM_ADJ_REASON_UI_VISIBILITY,
        OOM_ADJ_REASON_ALLOWLIST,
        OOM_ADJ_REASON_PROCESS_BEGIN,
        OOM_ADJ_REASON_PROCESS_END,
        OOM_ADJ_REASON_SHORT_FGS_TIMEOUT,
        OOM_ADJ_REASON_SYSTEM_INIT,
        OOM_ADJ_REASON_BACKUP,
        OOM_ADJ_REASON_SHELL,
        OOM_ADJ_REASON_REMOVE_TASK,
        OOM_ADJ_REASON_UID_IDLE,
        OOM_ADJ_REASON_STOP_SERVICE,
        OOM_ADJ_REASON_EXECUTING_SERVICE,
        OOM_ADJ_REASON_RESTRICTION_CHANGE,
        OOM_ADJ_REASON_COMPONENT_DISABLED,
    })
    @Retention(RetentionPolicy.SOURCE)
    public @interface OomAdjReason {}

由上面的源码可知,AMS的updateOomAdjLocked方法实现里直接调用OomAdjuster的updateOomAdjLocked方法,接下来到OomAdjuster中看看。

先看更新所有进程adj的方法:

// release/frameworks/base/services/core/java/com/android/server/am/OomAdjuster.java

    /**
     * Update OomAdj for all processes in LRU list
     */
    @GuardedBy("mService")
    void updateOomAdjLocked(@OomAdjReason int oomAdjReason) {
        synchronized (mProcLock) {
            updateOomAdjLSP(oomAdjReason);
        }
    }

updateOomAdjLocked里面直接调用updateOomAdjLSP。

注:updateOomAdjLocked的Locked意思是调用的时候持有ActivityManagerService这个对象的锁,updateOomAdjLSP的LSP是指同时持有ActivityManagerService对象和mProcLock对象两个锁,可不是什么老色批~_~。

// release/frameworks/base/services/core/java/com/android/server/am/OomAdjuster.java

    @GuardedBy({"mService", "mProcLock"})
    private void updateOomAdjLSP(@OomAdjReason int oomAdjReason) {
        // 检查是否正在更新adj
        if (checkAndEnqueueOomAdjTargetLocked(null)) {
            // Simply return as there is an oomAdjUpdate ongoing
            return;
        }
        try {
            // 标记正在更新adj
            mOomAdjUpdateOngoing = true;
            // 继续执行剩下逻辑
            performUpdateOomAdjLSP(oomAdjReason);
        } finally {
            ......
        }
    }
// release/frameworks/base/services/core/java/com/android/server/am/OomAdjuster.java

    @GuardedBy({"mService", "mProcLock"})
    private void performUpdateOomAdjLSP(@OomAdjReason int oomAdjReason) {
        final ProcessRecord topApp = mService.getTopApp();
        ......
        updateOomAdjInnerLSP(oomAdjReason, topApp , null, null, true, true);
    }
// release/frameworks/base/services/core/java/com/android/server/am/OomAdjuster.java

    /**
     * Update OomAdj for all processes within the given list (could be partial), or the whole LRU
     * list if the given list is null; when it's partial update, each process's client proc won't
     * get evaluated recursively here.
     */
    @GuardedBy({"mService", "mProcLock"})
    private void updateOomAdjInnerLSP(@OomAdjReason int oomAdjReason, final ProcessRecord topApp,
            ArrayList<ProcessRecord> processes, ActiveUids uids, boolean potentialCycles,
            boolean startProfiling) {
        ......

        for (int i = numProc - 1; i >= 0; i--) {
            ......
            computeOomAdjLSP(app, UNKNOWN_ADJ, topApp, fullUpdate, now, false,
                    computeClients);
            ......
        }

        if (computeClients) {
                ......
                if (computeOomAdjLSP(app, UNKNOWN_ADJ, topApp, true, now,
                        true, true)) {
                    retryCycles = true;
                }
                ......
        }

        ......
        boolean allChanged = updateAndTrimProcessLSP(now, nowElapsed, oldTime, activeUids,
                        oomAdjReason);
        ......
    }


    @GuardedBy({"mService", "mProcLock"})
    private boolean updateAndTrimProcessLSP(final long now, final long nowElapsed,
            final long oldTime, final ActiveUids activeUids, @OomAdjReason int oomAdjReason) {
                ......
                applyOomAdjLSP(app, true, now, nowElapsed, oomAdjReason);
                ......
    }

updateOomAdjInnerLSP方法里面:

首先调用computeOomAdjLSP计算进程的adj值,computeOomAdjLSP就不看了,虽然它是adj计算的精髓,但实在是太长了,我实在是看不动了,看了也记不住,有需求再细细分析。

官方其实也提供了computeOomAdjLSP实现逻辑的文档说明,路径是:release/frameworks/base/services/core/java/com/android/server/am/OomAdjuster.md

然后调用updateAndTrimProcessLSP方法遍历进程,调用applyOomAdjLSP方法更新进程adj、进程状态、调度组以及进程冻结状态。

// release/frameworks/base/services/core/java/com/android/server/am/OomAdjuster.java

    /** Applies the computed oomadj, procstate and sched group values and freezes them in set* */
    @GuardedBy({"mService", "mProcLock"})
    private boolean applyOomAdjLSP(ProcessRecord app, boolean doingAll, long now,
            long nowElapsed, @OomAdjReason int oomAdjReson) {

        ......
        if (state.getCurAdj() != state.getSetAdj()) {
            ProcessList.setOomAdj(app.getPid(), app.uid, state.getCurAdj());
            ......
        }
        ......
    }
// release/frameworks/base/services/core/java/com/android/server/am/ProcessList.java

    /**
     * Set the out-of-memory badness adjustment for a process.
     * If {@code pid <= 0}, this method will be a no-op.
     *
     * @param pid The process identifier to set.
     * @param uid The uid of the app
     * @param amt Adjustment value -- lmkd allows -1000 to +1000
     *
     * {@hide}
     */
    public static void setOomAdj(int pid, int uid, int amt) {
        ......
        ByteBuffer buf = ByteBuffer.allocate(4 * 4);
        buf.putInt(LMK_PROCPRIO);
        buf.putInt(pid);
        buf.putInt(uid);
        buf.putInt(amt);
        writeLmkd(buf, null);
        ......
    }

applyOomAdjLSP方法里面最终调用ProcessList.setOomAdj,setOomAdj方法里面将LMK_PROCPRIO命令,以及进程的pid、uid、adj值写入buffer,并调用writeLmkd将数据通过socket传递到lmkd进程,我们在来到lmkd进程代码里面看看。

// release/system/memory/lmkd/lmkd.cpp

static void ctrl_command_handler(int dsock_idx) {
    ......
    case LMK_PROCPRIO:
        ......
        cmd_procprio(packet, nargs, &cred);
        break;
    ......
}


static void cmd_procprio(LMKD_CTRL_PACKET packet, int field_count, struct ucred *cred) {
    ......

    // 解析socket数据到params
    lmkd_pack_get_procprio(packet, field_count, &params);

    // 校验oomadj是否在范围内
    if (params.oomadj < OOM_SCORE_ADJ_MIN ||
        params.oomadj > OOM_SCORE_ADJ_MAX) {
        ALOGE("Invalid PROCPRIO oomadj argument %d", params.oomadj);
        return;
    }

    ......

    // 将adj值写入/proc/<pid>/oom_score_adj
    snprintf(path, sizeof(path), "/proc/%d/oom_score_adj", params.pid);
    snprintf(val, sizeof(val), "%d", params.oomadj);
    if (!writefilestring(path, val, false)) {
        ......
    }

    ......

    // 更新oom_score_adj数组链表,关于这个数据结构下文会分析
    procp = pid_lookup(params.pid);
    if (!procp) {
        ......
        proc_insert(procp);
    } else {
        ......
        proc_unslot(procp);
        procp->oomadj = params.oomadj;
        proc_slot(procp);
    }
}

更新单个进程的方法也类似,调用链是:
ActivityManagerService#updateOomAdjLocked(ProcessRecord app, @OomAdjReason int oomAdjReason)
--->OomAdjuster#updateOomAdjLocked(ProcessRecord app, @OomAdjReason int oomAdjReason)
-------->OomAdjuster#updateOomAdjLSP(ProcessRecord app, @OomAdjReason int oomAdjReason)
------------>OomAdjuster#performUpdateOomAdjLSP(ProcessRecord app, @OomAdjReason int oomAdjReason)
---------------->OomAdjuster#performUpdateOomAdjLSP(ProcessRecord app, int cachedAdj,ProcessRecord topApp, long now, @OomAdjReason int oomAdjReason)
-------------------->OomAdjuster#computeOomAdjLSP(ProcessRecord app, int cachedAdj, ProcessRecord topApp, boolean doingAll, long now, boolean cycleReEval, boolean computeClients)
-------------------->OomAdjuster#applyOomAdjLSP(ProcessRecord app, boolean doingAll, long now, long nowElapsed, @OomAdjReason int oomAdjReson)

4.lmkd进程

4.1 lmkd进程启动

lmkd进程在init进程解析init.rc时启动

// release/system/core/rootdir/init.rc

on init
    ......
    start lmkd
    ......

lmkd服务定义在lmkd.rc中

// release/system/memory/lmkd/lmkd.rc

service lmkd /system/bin/lmkd
    class core
    user lmkd
    group lmkd system readproc
    capabilities DAC_OVERRIDE KILL IPC_LOCK SYS_NICE SYS_RESOURCE
    critical
    socket lmkd seqpacket+passcred 0660 system system
    task_profiles ServiceCapacityLow

lmkd进程创建后,会执行/system/bin/lmkd可执行文件,由Andriod.bp可知/system/bin/lmkd的入口是lmkd.cpp的main函数。

// release/system/memory/lmkd/Android.bp

cc_binary {
    name: "lmkd",

    srcs: [
        "lmkd.cpp",
        "reaper.cpp",
        "watchdog.cpp",
    ],
    ......
}

4.2 lmkd进程初始化

4.2.1 main函数

接下来从lmkd.cpp的main函数看起

// release/system/memory/lmkd/lmkd.cpp

int main(int argc, char **argv) {
    ......

    // 更新属性
    if (!update_props()) {
        ALOGE("Failed to initialize props, exiting.");
        return -1;
    }

    ctx = create_android_logger(KILLINFO_LOG_TAG);

    // init初始化
    if (!init()) {
        if (!use_inkernel_interface) {
            ...
        }

        if (init_reaper()) {
            ALOGI("Process reaper initialized with %d threads in the pool",
                reaper.thread_cnt());
        }

        // 进入循环等待所有注册在epoll上的事件
        mainloop();
    }

    android_log_destroy(&ctx);

    ALOGI("exiting");
    return 0;
}

4.2.2 init

// release/system/memory/lmkd/lmkd.cpp

static int init(void) {
    ......

    // 创建epoll实例获取文件描述符
    epollfd = epoll_create(MAX_EPOLL_EVENTS);
    ......

    // 将socket连接标记为未连接
    for (int i = 0; i < MAX_DATA_CONN; i++) {
        data_sock[i].sock = -1;
    }

    // 获取套接字描述符 /dev/socket/lmkd
    ctrl_sock.sock = android_get_control_socket("lmkd");
    ......

    // 设置socket处于监听状态
    ret = listen(ctrl_sock.sock, MAX_DATA_CONN);
    ......

    // 设置监听ctrl_sock.sock上的可读事件
    epev.events = EPOLLIN;
    // 关联事件处理函数,当epoll检测到/dev/socket/lmkd上的EPOLLIN事件时,调用ctrl_connect_handler函数处理事件
    ctrl_sock.handler_info.handler = ctrl_connect_handler;
    // 设置事件关联数据指针
    epev.data.ptr = (void *)&(ctrl_sock.handler_info);
    // 向epoll实例中添加/dev/socket/lmkd监听
    if (epoll_ctl(epollfd, EPOLL_CTL_ADD, ctrl_sock.sock, &epev) == -1) {
        ALOGE("epoll_ctl for lmkd control socket failed (errno=%d)", errno);
        return -1;
    }
    maxevents++;

    // 通过判断/sys/module/lowmemorykiller/parameters/minfree是否可访问来给use_inkernel_interface赋值
    // lmk有内核空间和用户空间两种实现,use_inkernel_interface为true时表示使用内核实现,为false时使用用户空间的实现
    // Android 14 use_inkernel_interface为false
    has_inkernel_module = !access(INKERNEL_MINFREE_PATH, W_OK);
    use_inkernel_interface = has_inkernel_module;

    if (use_inkernel_interface) {
        ......
    } else {
        // 初始化内存压力监听
        if (!init_monitors()) {
            return -1;
        }
        /* let the others know it does support reporting kills */
        property_set("sys.lmk.reportkills", "1");
    }

    ......
}

init函数里面主要做了两件事情:一是设置并监听/dev/socket/lmkd节点,等待socket客户端连接;二是调用init_monitors函数初始化内存压力监听。

4.2.3 设置lmkd socket连接处理函数和消息处理函数

当epoll监听到lmkd socket有新的客户端请求连接时,回调ctrl_connect_handler函数

// release/system/memory/lmkd/lmkd.cpp

static void ctrl_connect_handler(int data __unused, uint32_t events __unused,
                                 struct polling_params *poll_params __unused) {
    struct epoll_event epev;
    // 获取空闲的连接
    int free_dscock_idx = get_free_dsock();
    ......

    // 接受一个socket连接
    data_sock[free_dscock_idx].sock = accept(ctrl_sock.sock, NULL, NULL);
    ......

    ALOGI("lmkd data connection established");
    /* use data to store data connection idx */
    data_sock[free_dscock_idx].handler_info.data = free_dscock_idx;
    // 设置socket消息处理函数
    data_sock[free_dscock_idx].handler_info.handler = ctrl_data_handler;
    data_sock[free_dscock_idx].async_event_mask = 0;

    // epoll监听客户端消息
    epev.events = EPOLLIN;
    epev.data.ptr = (void *)&(data_sock[free_dscock_idx].handler_info);
    if (epoll_ctl(epollfd, EPOLL_CTL_ADD, data_sock[free_dscock_idx].sock, &epev) == -1) {
        ALOGE("epoll_ctl for data connection socket failed; errno=%d", errno);
        ctrl_data_close(free_dscock_idx);
        return;
    }
    maxevents++;
}

ctrl_connect_handler在接收到连接请求时,检查看是否超过最大连接数,没有的话接受该连接请求,并给连接设置消息处理函数,并通过epoll监听客户端消息,当有客户端消息到来时,回调ctrl_data_handler处理消息。

// release/system/memory/lmkd/lmkd.cpp

static void ctrl_data_handler(int data, uint32_t events,
                              struct polling_params *poll_params __unused) {
    if (events & EPOLLIN) {
        ctrl_command_handler(data);
    }
}

static void ctrl_command_handler(int dsock_idx) {
    ......

    len = ctrl_data_read(dsock_idx, (char *)packet, CTRL_PACKET_MAX_SIZE, &cred);
    ......

    cmd = lmkd_pack_get_cmd(packet);
    ......

    switch(cmd) {
    case LMK_TARGET:
        ......
        cmd_target(targets, packet);
        break;
    case LMK_PROCPRIO:
        ......
        cmd_procprio(packet, nargs, &cred);
        break;
    ......
}

客户端发送的消息(比如AMS的ProcessList),最终走到ctrl_command_handler函数,根据不同的cmd执行不同处理函数。

4.2.4 进程终止策略

lmk要杀进程,就需要知道系统内存等资源(memory、io、cpu)是否存在压力。早期Android使用内核实现的LMK,在启用用户空间的lmkd后,lmkd最初使用内核vmpressure信号来评估内存压力,Android10以后,改为使用内核压力失速信息(PSI)监视器来检测内存压力。

用户空间lmkd还支持一种旧模式,在该模式下,它使用与内核中的LMK驱动程序相同的策略(即可用内存和文件缓存阈值)做出终止决策。要启用旧模式,需要将ro.lmk.use_minfree_levels属性设置为true。

进程终止策略的初始化函数是init_monitors:

// release/system/memory/lmkd/lmkd.cpp

static bool init_monitors() {
    // 本文代码基于Android14 默认是使用 PSI
    use_psi_monitors = GET_LMK_PROPERTY(bool, "use_psi", true) &&
        init_psi_monitors();
    // 否则使用 vmpressure事件
    if (!use_psi_monitors &&
        (!init_mp_common(VMPRESS_LEVEL_LOW) ||
        !init_mp_common(VMPRESS_LEVEL_MEDIUM) ||
        !init_mp_common(VMPRESS_LEVEL_CRITICAL))) {
        ALOGE("Kernel does not support memory pressure events or in-kernel low memory killer");
        return false;
    }
    if (use_psi_monitors) {
        ALOGI("Using psi monitors for memory pressure detection");
    } else {
        ALOGI("Using vmpressure for memory pressure detection");
    }
    return true;
}
// release/system/memory/lmkd/lmkd.cpp

static bool init_psi_monitors() {
    // use_minfree_levels默认为false,use_new_strategy默认为true
    bool use_new_strategy =
        GET_LMK_PROPERTY(bool, "use_new_strategy", low_ram_device || !use_minfree_levels);
    ......
    // 设置3个压力等级的阈值,后面init_mp_psi时只有VMPRESS_LEVEL_MEDIUM和VMPRESS_LEVEL_CRITICAL两个等级会写入psi
    if (use_new_strategy) {
        // VMPRESS_LEVEL_LOW的threshold_ms为0,init_mp_psi时会直接return true
        psi_thresholds[VMPRESS_LEVEL_LOW].threshold_ms = 0;
        // psi_partial_stall_ms 为 70
        psi_thresholds[VMPRESS_LEVEL_MEDIUM].threshold_ms = psi_partial_stall_ms;
        // psi_complete_stall_ms 为 700
        psi_thresholds[VMPRESS_LEVEL_CRITICAL].threshold_ms = psi_complete_stall_ms;
    }

    if (!init_mp_psi(VMPRESS_LEVEL_LOW, use_new_strategy)) {
        return false;
    }
    if (!init_mp_psi(VMPRESS_LEVEL_MEDIUM, use_new_strategy)) {
        destroy_mp_psi(VMPRESS_LEVEL_LOW);
        return false;
    }
    if (!init_mp_psi(VMPRESS_LEVEL_CRITICAL, use_new_strategy)) {
        destroy_mp_psi(VMPRESS_LEVEL_MEDIUM);
        destroy_mp_psi(VMPRESS_LEVEL_LOW);
        return false;
    }
    return true;
}

经过上面对psi_thresholds的设置之后,psi_thresholds的内容为:

static struct psi_threshold psi_thresholds[VMPRESS_LEVEL_COUNT] = {
    { PSI_SOME, 0 },
    { PSI_SOME, 70 },
    { PSI_FULL, 700 },
};

看到这里,建议先了解一下PSI:
https://facebookmicrosites.github.io/psi/docs/overview
https://www.cnblogs.com/Linux-tech/p/12961296.html
https://zhuanlan.zhihu.com/p/656580184

// release/system/memory/lmkd/lmkd.cpp

static bool init_mp_psi(enum vmpressure_level level, bool use_new_strategy) {
    int fd;

    // level为VMPRESS_LEVEL_LOW直接return true
    if (!psi_thresholds[level].threshold_ms) {
        return true;
    }

    fd = init_psi_monitor(psi_thresholds[level].stall_type,
        psi_thresholds[level].threshold_ms * US_PER_MS,
        PSI_WINDOW_SIZE_MS * US_PER_MS);

    ......

    vmpressure_hinfo[level].handler = use_new_strategy ? mp_event_psi : mp_event_common;
    vmpressure_hinfo[level].data = level;
    if (register_psi_monitor(epollfd, fd, &vmpressure_hinfo[level]) < 0) {
        destroy_psi_monitor(fd);
        return false;
    }
    ......
}
// release/system/memory/lmkd/libpsi/psi.cpp

int init_psi_monitor(enum psi_stall_type stall_type,
             int threshold_us, int window_us) {
    ......
    fd = TEMP_FAILURE_RETRY(open(PSI_PATH_MEMORY, O_WRONLY | O_CLOEXEC));
    ......
    switch (stall_type) {
    case (PSI_SOME):
    case (PSI_FULL):
        // "some 70000 1000000"
        // "full 700000 1000000"
        res = snprintf(buf, sizeof(buf), "%s %d %d",
            stall_type_name[stall_type], threshold_us, window_us);
        break;
    ......
    }
    res = TEMP_FAILURE_RETRY(write(fd, buf, strlen(buf) + 1));
    ......
}
// release/system/memory/lmkd/libpsi/psi.cpp

int register_psi_monitor(int epollfd, int fd, void* data) {
    ......
    epev.events = EPOLLPRI;
    epev.data.ptr = data;
    res = epoll_ctl(epollfd, EPOLL_CTL_ADD, fd, &epev);
    ......
}

init_mp_psi函数里面:

1.调用init_psi_monitor函数向/proc/pressure/memory写入需要监听的阈值。这里一共写入了两组:"some(stall_type) 70000(threshold_us) 1000000(window_us)"、"full 700000 1000000"。以第一组为例,解释一下是什么意思:1秒内(window_us),some超过了70毫秒(threshold_us),PSI就会将这个事件上报给lmkd。

向/proc/pressure/memory写入数据后,内核PSI是怎么知道的呢?这里简单贴一段内核PSI的源码:

// linux(v6.12)/kernel/sched/psi.c

static const struct proc_ops psi_memory_proc_ops = {
	.proc_open	= psi_memory_open,
	.proc_read	= seq_read,
	.proc_lseek	= seq_lseek,
	.proc_write	= psi_memory_write,
	.proc_poll	= psi_fop_poll,
	.proc_release	= psi_fop_release,
};

static int __init psi_proc_init(void)
{
	if (psi_enable) {
		proc_mkdir("pressure", NULL);
		proc_create("pressure/io", 0666, NULL, &psi_io_proc_ops);
		proc_create("pressure/memory", 0666, NULL, &psi_memory_proc_ops);
		proc_create("pressure/cpu", 0666, NULL, &psi_cpu_proc_ops);
#ifdef CONFIG_IRQ_TIME_ACCOUNTING
		proc_create("pressure/irq", 0666, NULL, &psi_irq_proc_ops);
#endif
	}
	return 0;
}

其实就是对/proc/pressure/memory节点注册了函数操作表psi_memory_proc_ops。当对这个节点open操作时,会执行psi_memory_open函数;write操作时,会执行psi_memory_write函数...

Linux 中的 /proc 文件系统是一个虚拟文件系统,提供了有关正在运行的内核、进程、系统硬件和其他系统资源当前状态的详细信息。它充当了内核和用户空间之间的接口,允许用户和程序访问和操作内核参数和运行时信息。
https://zhuanlan.zhihu.com/p/694564574

2.调用register_psi_monitor函数将/proc/pressure/memory节点添加到epoll监听,当内核PSI检测到突破阈值时,回调到mp_event_psi函数。

// release/system/memory/lmkd/lmkd.cpp

static void mp_event_psi(int data, uint32_t events, struct polling_params *poll_params) {
    ......

    // /proc/meminfo节点数据
    union meminfo mi;
    // /proc/vmstat节点数据
    union vmstat vs;
    ......
    // 本次触发事件的内存压力等级
    enum vmpressure_level level = (enum vmpressure_level)data;
    // 杀进程的原因,kill_reason != NONE时才会杀进程
    enum kill_reasons kill_reason = NONE;

    ......
    // 是否正在kill进程
    bool kill_pending = is_kill_pending();
    ......
    // 停止等待上一次kill 
    stop_wait_for_proc_kill(!kill_pending);

    // 读取/proc/vmstat节点数据
    if (vmstat_parse(&vs) < 0) {
        ALOGE("Failed to parse vmstat!");
        return;
    }
    ......

    // 读取/proc/meminfo节点数据
    if (meminfo_parse(&mi) < 0) {
        ALOGE("Failed to parse meminfo!");
        return;
    }

    ......

    // 检查交换内存是否处于低内存状态
    if (swap_free_low_percentage) {
        swap_low_threshold = mi.field.total_swap * swap_free_low_percentage / 100;
        swap_is_low = get_free_swap(&mi) < swap_low_threshold;
    } else {
        swap_low_threshold = 0;
    }

    // 识别回收状态,需要先了解/proc/vmstat节点里面数据代表的意义  
    // https://www.cnblogs.com/pengdonglin137/p/17877411.html  
    // https://wenku.baidu.com/view/a40d9e16bad528ea81c758f5f61fb7360b4c2b89.html?_wkts_=1733388128044&bdQuery=workingset_refault_file  
    if (vs.field.pgscan_direct != init_pgscan_direct) {
        init_pgscan_direct = vs.field.pgscan_direct;
        init_pgscan_kswapd = vs.field.pgscan_kswapd;
        reclaim = DIRECT_RECLAIM;
    } else if (vs.field.pgscan_kswapd != init_pgscan_kswapd) {
        init_pgscan_kswapd = vs.field.pgscan_kswapd;
        reclaim = KSWAPD_RECLAIM;
    } else if (workingset_refault_file == prev_workingset_refault) {
        /*
         * Device is not thrashing and not reclaiming, bail out early until we see these stats
         * changing
         */
        goto no_kill;
    }

    prev_workingset_refault = workingset_refault_file;

    ......

    ......
    // 这里有一大段不怎么想研究,主要是对节点里面数据表示的意义不太清楚,有机会再研究研究
    ......

    // 读取/proc/pressure/memory节点的数据到psi_data
    if (!psi_parse_mem(&psi_data)) {
        // pis memory的full在10s周期内的阻塞时间百分比是否大于100%,大于则表示已经处于严重等级
        critical_stall = psi_data.mem_stats[PSI_FULL].avg10 > (float)stall_limit_critical;
    }
    
    // 接下来的一大段是根据前面的信息来确定kill_reason和min_score_adj,这里省略  
    // 可以知道的是,这段逻辑里min_score_adj只可能为201
    ......

    // kill_reason不为NONE时需要kill进程
    if (kill_reason != NONE) {
        struct kill_info ki = {
            .kill_reason = kill_reason,
            .kill_desc = kill_desc,
            .thrashing = (int)thrashing,
            .max_thrashing = max_thrashing,
        };

        // 当系统内存压力已经处于严重等级,查杀min_score_adj为0以上的用户可感知进程
        if (critical_stall) {
            min_score_adj = 0;
        }
        // 读取/proc/pressure/io、/proc/pressure/cpu数据到psi_data
        psi_parse_io(&psi_data);
        psi_parse_cpu(&psi_data);
        // 查找并杀死一个进程
        int pages_freed = find_and_kill_process(min_score_adj, &ki, &mi, &wi, &curr_tm, &psi_data);
        if (pages_freed > 0) {
            killing = true;
            max_thrashing = 0;
            if (cut_thrashing_limit) {
                // 如果需要降低内存抖动阈值,重新计算内存抖动阈值
                thrashing_limit = (thrashing_limit * (100 - thrashing_limit_decay_pct)) / 100;
            }
        }
    }

no_kill:
    ......
}

在看find_and_kill_process函数之前,先了解一下lmkd是如何管理oom_score_adj和进程的关系的。

// release/system/memory/lmkd/lmkd.cpp

#define ADJTOSLOT(adj) ((adj) + -OOM_SCORE_ADJ_MIN)
#define ADJTOSLOT_COUNT (ADJTOSLOT(OOM_SCORE_ADJ_MAX) + 1)

static struct adjslot_list procadjslot_list[ADJTOSLOT_COUNT];

struct adjslot_list {
    struct adjslot_list *next;
    struct adjslot_list *prev;
};

struct proc {
    struct adjslot_list asl;
    int pid;
    int pidfd;
    uid_t uid;
    int oomadj;
    pid_t reg_pid; /* PID of the process that registered this record */
    bool valid;
    struct proc *pidhash_next;
};

lmkd用一个procadjslot_list数组来存储进程信息。procadjslot_list数组的长度ADJTOSLOT_COUNT是2001,正好是oom_score_adj的取值范围[-1000, 1000]内的所有整数的个数,即下标为[0, 2000]。举个栗子当oom_score_adj为1000时,那么经过ADJTOSLOT(1000)得出下标为2000,当oom_score_adj为-1000时,那么经过ADJTOSLOT(-1000)得出下标为0。数组里的每一个元素都是一个双向循环链表,每个链表存的是对应oom_score_adj的进程,每个元素默认有一个头结点,头结点的类型是adjslot_list结构体,头结点后面跟着进程信息结构体proc类型的节点。这个结构可以从相关的操作函数得出:proc_insert、proc_slot、adjslot_insert等,下面我画了一张图可以直观的表示出这个数据结构:

// release/system/memory/lmkd/lmkd.cpp

static int find_and_kill_process(int min_score_adj, struct kill_info *ki, union meminfo *mi,
                                 struct wakeup_info *wi, struct timespec *tm,
                                 struct psi_data *pd) {
    int i;
    int killed_size = 0;
    bool lmk_state_change_start = false;
    bool choose_heaviest_task = kill_heaviest_task;

    // 这里从oom_score_adj 1000开始查找,最小可以到min_score_adj  
    // 由mp_event_psi函数我们可以知道,min_score_adj要么为0,要么为201  
    for (i = OOM_SCORE_ADJ_MAX; i >= min_score_adj; i--) {
        struct proc *procp;

        // 即使没有开启kill_heaviest_task优先查杀重量级进程,当oom_score_adj小于200时,也默认优先查杀重量级进程
        if (!choose_heaviest_task && i <= PERCEPTIBLE_APP_ADJ) {
            /*
             * If we have to choose a perceptible process, choose the heaviest one to
             * hopefully minimize the number of victims.
             */
            // 如果我们必须选择一个可感知的进程,那就选择资源占用最多(最重)的那个进程,希望借此将受影响(被终止)的进程数量减到最少。
            choose_heaviest_task = true;
        }

        // 在数组里下标为i的链表里查找
        while (true) {
            // 如果优先查找重量级进程,调用proc_get_heaviest方法,否则调用proc_adj_tail获取oom_score_adj为i的链表里的最后一个进程
            procp = choose_heaviest_task ?
                proc_get_heaviest(i) : proc_adj_tail(i);

            if (!procp)
                break;

            // 杀死一个进程
            killed_size = kill_one_process(procp, min_score_adj, ki, mi, wi, tm, pd);
            if (killed_size >= 0) {
                if (!lmk_state_change_start) {
                    lmk_state_change_start = true;
                    // 发送消息给socket客户端
                    stats_write_lmk_state_changed(STATE_START);
                }
                break;
            }
        }
        if (killed_size) {
            break;
        }
    }

    if (lmk_state_change_start) {
        // 发送消息给socket客户端
        stats_write_lmk_state_changed(STATE_STOP);
    }

    return killed_size;
}

看看如何查找一个重量级进程:

// release/system/memory/lmkd/lmkd.cpp

static struct proc *proc_get_heaviest(int oomadj) {
    // 取oom_score_adj对应链表的头结点
    struct adjslot_list *head = &procadjslot_list[ADJTOSLOT(oomadj)];
    // 头结点后的第一个进程信息节点
    struct adjslot_list *curr = head->next;
    struct proc *maxprocp = NULL;
    int maxsize = 0;
    // 从第一个节点开始找,通过/proc/<pid>/statm找到实际使用物理内存最大的进程
    while (curr != head) {
        int pid = ((struct proc *)curr)->pid;
        int tasksize = proc_get_size(pid);
        if (tasksize < 0) {
            struct adjslot_list *next = curr->next;
            pid_remove(pid);
            curr = next;
        } else {
            if (tasksize > maxsize) {
                maxsize = tasksize;
                maxprocp = (struct proc *)curr;
            }
            curr = curr->next;
        }
    }
    return maxprocp;
}

也就是说,重量级进程,就是指实际使用物理内存(RSS)最大的进程。

再来看看如何kill一个进程:

// release/system/memory/lmkd/lmkd.cpp

/* Kill one process specified by procp.  Returns the size (in pages) of the process killed */
static int kill_one_process(struct proc* procp, int min_oom_score, struct kill_info *ki,
                            union meminfo *mi, struct wakeup_info *wi, struct timespec *tm,
                            struct psi_data *pd) {
    ......

    // 调用reaper.kill杀进程
    kill_result = reaper.kill({ pidfd, pid, uid }, false);

    ......
    // 发LMK_STAT_KILL_OCCURRED消息给socket客户端
    stats_write_lmk_kill_occurred(&kill_st, mem_st);

    // 发LMK_PROCKILL消息给socket客户端
    ctrl_data_write_lmk_kill_occurred((pid_t)pid, uid);

    ......
}
// release/system/memory/lmkd/reaper.cpp

int Reaper::kill(const struct target_proc& target, bool synchronous) {
    // 如果未获取进程文件描述符(pidfd),调用传统的kill函数来发送SIGKILL信号终止目标进程
    if (target.pidfd < 0) {
        return ::kill(target.pid, SIGKILL);
    }

    // 异步杀进程
    if (!synchronous && async_kill(target)) {
        // we assume the kill will be successful and if it fails we will be notified
        return 0;
    }

    // 如果前面异步杀进程失败,再次调用pidfd_send_signal向进程发送SIGKILL信号
    int result = pidfd_send_signal(target.pidfd, SIGKILL, NULL, 0);
    if (result) {
        return result;
    }

    return 0;
}
// release/system/memory/lmkd/reaper.cpp

bool Reaper::async_kill(const struct target_proc& target) {
    ......

    // 任务入队
    queue_.push_back({ dup(target.pidfd), target.pid, target.uid });
    // 唤醒任务线程
    cond_.notify_one();
    ......
}

async_kill函数内将要kill的进程加入队列中,并唤醒处理任务的线程。

// release/system/memory/lmkd/reaper.cpp

bool Reaper::init(int comm_fd) {
    ......

    thread_pool_ = new pthread_t[THREAD_POOL_SIZE];
    for (int i = 0; i < THREAD_POOL_SIZE; i++) {
        if (pthread_create(&thread_pool_[thread_cnt_], NULL, reaper_main, this)) {
            ALOGE("pthread_create failed: %s", strerror(errno));
            continue;
        }
        ......
    }

    ......
}

Reaper在init的时候(在lmkd.cpp的main函数中调用init_reaper函数时),调用pthread_create函数创建线程并保存在线程池thread_pool_中,指定线程启动后执行的函数为reaper_main函数。

// release/system/memory/lmkd/reaper.cpp

static void* reaper_main(void* param) {
    ......

    for (;;) {
        // 从队列中取出一个target_proc
        target = reaper->dequeue_request();

        ......

        // 调用pidfd_send_signal向进程发送SIGKILL信号
        if (pidfd_send_signal(target.pidfd, SIGKILL, NULL, 0)) {
            // Inform the main thread about failure to kill
            reaper->notify_kill_failure(target.pid);
            goto done;
        }

        ......
}

pidfd_send_signal 函数与传统信号发送方式(kill函数)的比较
准确性和安全性
传统的kill函数是通过进程 ID(pid)来发送信号,在进程 ID 可能被复用的情况下(即一个新进程被分配了已经终止的旧进程的 ID),可能会导致信号发送到错误的进程。而pidfd_send_signal使用pidfd,这个文件描述符与特定进程的生命周期绑定,在进程存活期间是唯一对应的,减少了信号发送错误目标的风险。
资源管理和灵活性
pidfd可以更好地与文件描述符相关的系统调用和机制集成。例如,可以将pidfd放入epoll等 I/O 复用机制中,当目标进程发生某些事件(如终止)时,可以通过epoll机制及时得到通知,然后再决定是否发送信号或者进行其他操作。这种集成性是kill函数所没有的,提供了更灵活的进程信号管理方式。

下面推荐一些写lmk的文章:
https://source.android.google.cn/docs/core/perf/lmkd?hl=zh-cn
https://blog.csdn.net/youthcowboy/article/details/140665606
https://blog.csdn.net/omnispace/article/details/73320950
https://blog.csdn.net/weixin_40214774/article/details/141687953
https://blog.csdn.net/weixin_40214774/article/details/141230790
https://gityuan.com/2016/09/17/android-lowmemorykiller/
https://gityuan.com/2018/05/19/android-process-adj/
https://blog.csdn.net/buhui912/article/details/107153804/

如果你觉得这篇文章对你有帮助,麻烦点赞、关注、分享。如果文章中有理解有误的地方,还请指正。
转载请注明出处。