腾讯性能监控框架Matrix源码分析之第一篇

腾讯性能监控框架Matrix源码分析之第一篇

概述

前几天腾讯将一款Android应用性能监控的框架matrix开源了,源码地址在https://github.com/Tencent/matrix,作者是微信终端团队。matrix到底是什么?据官方说法如下:
Matrix 是一款微信研发并日常使用的 APM(Application Performance Manage),当前主要运行在 Android 平台上。 Matrix 的目标是建立统一的应用性能接入框架,通过各种性能监控方案,对性能监控项的异常数据进行采集和分析,输出相应的问题分析、定位与优化建议,从而帮助开发者开发出更高质量的应用。
Matrix 当前监控范围包括:应用安装包大小,帧率变化,启动耗时,卡顿,慢方法,SQLite 操作优化,文件读写,内存泄漏等等(此段截取自matrix的GitHub介绍)
下面直接看源码:

代码的入口在application的onCreate()里进行初始化的,

Matrix.Builder builder = new Matrix.Builder(this);
1
//省略了一部分构造器创建对象的一段代码,这里仅说明是入口
Matrix.init(builder.build());

和leakcanary等库一样在application初始化,Matrix的创建采用了常用的构造器模式,现在进入Matrix内部看看

private static volatile Matrix sInstance;

private final HashSet<Plugin> plugins;//插件集合
private final Application     application;
private final PluginListener  pluginListener; 

private Matrix(Application app, PluginListener listener, HashSet<Plugin> plugins) {
    this.application = app;
    this.pluginListener = listener;
    this.plugins = plugins;
    for (Plugin plugin : plugins) {
        plugin.init(application, pluginListener);
        pluginListener.onInit(plugin);
    }

}

public static void setLogIml(MatrixLog.MatrixLogImp imp) {
    MatrixLog.setMatrixLogImp(imp);
}

public static boolean isInstalled() {
    return sInstance != null;
}

public static Matrix init(Matrix matrix) {
    if (matrix == null) {
        throw new RuntimeException("Matrix init, Matrix should not be null.");
    }
    synchronized (Matrix.class) {
        if (sInstance == null) {
            sInstance = matrix;
        } else {
            MatrixLog.e(TAG, "Matrix instance is already set. this invoking will be ignored");
        }
    }
    return sInstance;
}

matrix里持有一个插件的集合plugins,使用的是hashSet来保证不出现重复,还有一个plugin状态的监听pluginListener。Matrix采用了单例模式,volatile sInstance保证线程可见行,初始化的时候采用了双重检查,在构造函数里给变量赋值并遍历plugins集合,并逐个调用插件的初始化方法plugin.init()
那插件plugin是什么呢?下面是plugin的代码:

public abstract class Plugin implements IPlugin, IssuePublisher.OnIssueDetectListener {
private static final String TAG = "Matrix.Plugin";

public static final int PLUGIN_CREATE    = 0x00;
public static final int PLUGIN_INITED    = 0x01;
public static final int PLUGIN_STARTED   = 0x02;
public static final int PLUGIN_STOPPED   = 0x04;
public static final int PLUGIN_DESTROYED = 0x08;
private PluginListener pluginListener;
private Application    application;
private boolean isSupported = true;
private int status = PLUGIN_CREATE;
@Override
public void init(Application app, PluginListener listener) {
    if (application != null || pluginListener != null) {
        throw new RuntimeException("plugin duplicate init, application or plugin listener is not null");
    }
    status = PLUGIN_INITED;
    this.application = app;
    this.pluginListener = listener;
}
@Override
public void onDetectIssue(Issue issue) {
    pluginListener.onReportIssue(issue);
}
public Application getApplication() {
    return application;
}
@Override
public void start() {
    //省略部分代码
    pluginListener.onStart(this);
}
@Override
public void stop() {
    //省略部分代码
    pluginListener.onStop(this);
}

@Override
public void destroy() {
    //省略部分代码
    pluginListener.onDestroy(this);
}
  }

plugin它是个抽象类,继承了IPlugin和 IssuePublisher.OnIssueDetectListener,IPlugin包括了五种插件的状态分别是CREATEINITEDSTARTEDSTOPPEDDESTROYED,当plugin状态发生变化时将回调交给pluginListener来处理。OnIssueDetectListener接口是IssuePublisher类里的内部接口,IssuePublisher具体做了两件事,记录问题和暴露问题,其暴露问题的方法就是空实现然后暴露接口,交给实现OnIssueDetectListener接口的具体类来处理,Plugin继承了这个OnIssueDetectListener接口,但它也没自己处理,也是同样交留pluginListener来处理。

第一段小结

  1. Matrix是个单例,它维护着插件的集合plugins和插件不同状态及报错的处理接口
  2. pluginListener,这个pluginListener是plugins集合共有的,
  3. matrix初始化的时候会逐个调用plugin 的init方法。
  4. 插件Plugin是个抽象类,具体的插件需要实现的,matrix框架里自带的插件有TracePluginIOCanaryPluginSQLiteLintPluginResourcePlugin
    下面会一一查看它们的作用和具体实现

TracePlugin

首先来看TracePlugin,它继承自plugin,里面包括四个维度FrameTracerFPSTracerEvilMethodTracerStartUpTracer来分析app的,初始化的方法如下:

 @Override
public void init(Application app, PluginListener listener) {
    super.init(app, listener);
    MatrixLog.i(TAG, "trace plugin init, trace config: %s", mTraceConfig.toString());
    //低版本不支持
    if (Build.VERSION.SDK_INT < Build.VERSION_CODES.JELLY_BEAN) {
        MatrixLog.e(TAG, "[FrameBeat] API is low Build.VERSION_CODES.JELLY_BEAN(16), TracePlugin is not supported");
        unSupportPlugin();
        return;
    }
    ApplicationLifeObserver.init(app);
    mFrameTracer = new FrameTracer(this);
    //开关,可以选择不开
    if (mTraceConfig.isMethodTraceEnable()) {
        mStartUpTracer = new StartUpTracer(this, mTraceConfig);
    }
    if (mTraceConfig.isFPSEnable()) {
        mFPSTracer = new FPSTracer(this, mTraceConfig);
    }
    if (mTraceConfig.isMethodTraceEnable()) {
        mEvilMethodTracer = new EvilMethodTracer(this, mTraceConfig);
    }
}

ApplicationLifeObserver.init(app)是利用了application的ActivityLifecycleCallbacks可以对每个activity的生命周期进行监控做了个观察者模式,另外加了判断分析当前app是在前台还是后台,具体实现方式是记录onActivityResumed和onActivityPaused的生命周期,由于新起的activity的onResume会在底层activity的onPause之后,如果onActivityPaused之后600ms没有执行到onActivityResumed则认为当前处于后台。仔细想想这么做会有误伤,如果有个activity启动特别慢,此时超过600ms则判定已经处于后台了,不过这个影响比较小,因为activity启动之后到resume时就又恢复成正常的前台,即使误判也不影响检测,具体实现可以看源码。

在TracePlugin初始化的时候,分别新建了mStartUpTracer、mFPSTracer、mFrameTracer和mEvilMethodTracer,其中的参数mTraceConfig是个简单的配置类,只是记录了开关就不在这展开了。查看matrix的demo开始检测的入口是tracePlugin.start()里,代码如下:

@Override
public void start() {
    super.start();
    if (!isSupported()) {
        return;
    }
    new Handler(Looper.getMainLooper()).post(new Runnable() {
        @Override
        public void run() {
            //保证在主线程调用
            FrameBeat.getInstance().onCreate();
        }
    });
    if (null != mFPSTracer) {
        mFPSTracer.onCreate();
    }
    if (null != mEvilMethodTracer) {
        mEvilMethodTracer.onCreate();
    }
    if (null != mFrameTracer) {
        mFrameTracer.onCreate();
    }
    if (null != mStartUpTracer) {
        mStartUpTracer.onCreate();
    }
}

在onstart时在主线程中调用了FrameBeat.getInstance().onCreate(),这里是做UI分析用的。
目前做UI性能卡顿分析一般有两种方式:

一是利用主线程looper的loop方法在寻找msg.target.dispatchMessage(msg)时的前后会分别打印一段log,可以利用log的内容不同或者log的前后次数记录两次log的时间差,这样就可以大致认为是主线程处理msg的时间,如果时间过长则认为卡顿;

二是利用Choreographer,Choreographer就是一个消息处理器,根据vsync 信号 来计算frame,在doFrame方法里可以收到回调的当前时间,正常绘制两次doFrame的时间差应该是1000/60=16.6666毫秒(每秒60帧),但是遇到卡顿或过度重绘等会导致时间拉长。

这里采用的是第二种方式,其doFrame的实现如下:

 @Override
public void doFrame(long frameTimeNanos) {
    if (isPause) {
        return;
    }
    if (frameTimeNanos < mLastFrameNanos || mLastFrameNanos <= 0) {
        mLastFrameNanos = frameTimeNanos;
        if (null != mChoreographer) {
            mChoreographer.postFrameCallback(this);
        }
        return;
    }
    if (null != mFrameListeners) {
        for (IFrameBeatListener listener : mFrameListeners) {
            listener.doFrame(mLastFrameNanos, frameTimeNanos);
        }
        if (null != mChoreographer) {
            mChoreographer.postFrameCallback(this);
        }
        mLastFrameNanos = frameTimeNanos;
    }
}

记录两次doFrame的时间,交给mFrameListeners执行回调。
下面会分析mFPSTracer 、mFrameTracer、mFrameTracer和mStartUpTracer 的onCreate方法的具体实现,这四个类都继承了BaseTracer类,因此在分析得前先看下BaseTracer,

public abstract class BaseTracer extends IssuePublisher implements ApplicationLifeObserver.IObserver, IFrameBeatListener, IMethodBeatListener {

private final TracePlugin mPlugin;
private static final MethodBeat sMethodBeat = new MethodBeat();
private static final HashMap<Class<BaseTracer>, BaseTracer> sTracerMap = new HashMap<>();
BaseTracer(TracePlugin plugin) {
    super(plugin);
    this.mPlugin = plugin;
    sTracerMap.put((Class<BaseTracer>) this.getClass(), this);
}
@Override
public void doFrame(long lastFrameNanos, long frameNanos) {
}

public void onCreate() {
    MatrixLog.i(TAG, "[onCreate] name:%s process:%s", this.getClass().getCanonicalName(), Process.myPid());
    if (isEnableMethodBeat()) {
        if (!getMethodBeat().isHasListeners()) {
            getMethodBeat().onCreate();
        }
        getMethodBeat().registerListener(this);
    }
    ApplicationLifeObserver.getInstance().register(this);
    FrameBeat.getInstance().addListener(this);
    isCreated = true;
}

public void onDestroy() {
    MatrixLog.i(TAG, "[onDestroy] name:%s  process:%s", this.getClass().getCanonicalName(), Process.myPid());
    if (isEnableMethodBeat()) {
        getMethodBeat().unregisterListener(this);
        if (!getMethodBeat().isHasListeners()) {
            getMethodBeat().onDestroy();
        }
    }
    ApplicationLifeObserver.getInstance().unregister(this);
    FrameBeat.getInstance().removeListener(this);
    isCreated = false;
}
protected void sendReport(JSONObject jsonObject, String tag) {
    Issue issue = new Issue();
    issue.setTag(tag);
    issue.setContent(jsonObject);
    mPlugin.onDetectIssue(issue);
}
}

这里截取了一部分核心的代码,BaseTracer里有个静态的hashMap,类名作为key,value是具体的BaseTracer对象,它是静态的所以只会有一份,在onCreate里注册了前面说到的ApplicationLifeObserver和FrameBeat的监听,监听activity的生命周期的回调和Choreographer两次绘制的时间的接口回调,回调是交给自己来处理的。另外暴露了sendReport方法,方法里调用的是本地持有的TracePlugin对象的onDetectIssue来处理,这里和前文Matrix里plugins与pluginListener相对应上了,plugins里的onDetectIssue最终都是由pluginListener来处理的。

TracePlugin初步小结

TracePlugin分成四个部分mStartUpTracer、mFPSTracer、mFrameTracer和mEvilMethodTracer,它们都继承了BaseTracer;
BaseTracer里监听了ApplicationLifeObserver,即每个activity的生命周期和前后台状态的监听;
BaseTracer监听着FrameBeat的每一帧刷新前后的时间即doFrame(long lastFrameNanos, long frameNanos);

第一个纬度:mFrameTracer

下面我们来看看mFrameTracer
mFrameTracer的具体实现的关键方法doFrame

@Override
public void doFrame(final long lastFrameNanos, final long frameNanos) {
    if (!isDrawing) {
        return;
    }
    isDrawing = false;
    final int droppedCount = (int) ((frameNanos - lastFrameNanos) / REFRESH_RATE_MS);
    if (droppedCount > 1) {
        for (final IDoFrameListener listener : mDoFrameListenerList) {
            listener.doFrameSync(lastFrameNanos, frameNanos, getScene(), droppedCount);
            if (null != listener.getHandler()) {
                listener.getHandler().post(new Runnable() {
                    @Override
                    public void run() {
                        listener.getHandler().post(new AsyncDoFrameTask(listener,
                                lastFrameNanos, frameNanos, getScene(), droppedCount));
                    }
                });
            }
        }
    }
}

在doFrame里根据界面绘制的时间差计算,如果超过了正常绘制16.67秒就会在监听里把数据回调出去,这个有两个回调方法doFrameSync和doFrameAsync,对应的是同步调用和异步调用,异步的实现方式利用handler机制,其中getScene是当前的activity或fragment的类名。

第二个纬度:EvilMethodTracer

EvilMethodTracer顾名思义就是找到那些邪恶的方法,也就是耗时多的方法,具体实现来看代码,
首先EvilMethodTracer继承了LazyScheduler接口,LazyScheduler接口是个利用handler实现的定时器,代码如下:

public class LazyScheduler {
//延迟即间隔时间
private final long delay;
private final Handler mHandler;
private volatile boolean isSetUp = false;
public LazyScheduler(HandlerThread handlerThread, long delay) {
    this.delay = delay;
    mHandler = new Handler(handlerThread.getLooper());
}
public boolean isSetUp() {
    return isSetUp;
}
//开始
public void setUp(final ILazyTask task, boolean cycle) {
    if (null != mHandler) {
        this.isSetUp = true;
        mHandler.removeCallbacksAndMessages(null);
        RetryRunnable retryRunnable = new RetryRunnable(mHandler, delay, task, cycle);
        mHandler.postDelayed(retryRunnable, delay);
    }
}
public void cancel() {
    if (null != mHandler) {
        this.isSetUp = false;
        mHandler.removeCallbacksAndMessages(null);
    }
}
public void setOff() {
    cancel();
}
public interface ILazyTask {
    void onTimeExpire();
}

static class RetryRunnable implements Runnable {
    private final Handler handler;
    private final long delay;
    private final ILazyTask lazyTask;
    private final boolean cycle;
    RetryRunnable(Handler handler, long delay, ILazyTask lazyTask, boolean cycle) {
        this.handler = handler;
        this.delay = delay;
        this.lazyTask = lazyTask;
        this.cycle = cycle;
    }
    @Override
    public void run() {
        lazyTask.onTimeExpire();
        if (cycle) {
            handler.postDelayed(this, delay);
        }
    }
}

这个定时器的实现利用了handler机制,handler的looper直接从参数handlerThread的线程里获得,这里的代码并不复杂,只要记住一条就可以,一是定时器支持是否循环,执行是会调用ILazyTask的onTimeExpire方法。
EvilMethodTracer也是重写了定时器的onTimeExpire方法,下面来看EvilMethodTracer的具体代码。

public EvilMethodTracer(TracePlugin plugin, TraceConfig config) {
    super(plugin);
    this.mTraceConfig = config;
    mLazyScheduler = new LazyScheduler(MatrixHandlerThread.getDefaultHandlerThread(), Constants.DEFAULT_ANR);
    mActivityCreatedInfoMap = new HashMap<>();
}

创建时初始化了mLazyScheduler和mActivityCreatedInfoMap,mLazyScheduler,这里的mLazyScheduler是个定时器Constants.DEFAULT_ANR默认值5秒,用于记录界面5秒没响应及ANR,中间有句MatrixHandlerThread.getDefaultHandlerThread(),MatrixHandlerThread是个hadlerThread线程管理类,它里面包含了Matrix默认工作线程、主线程和一个动态创建新线程的容器,getDefaultHandlerThread()方法获取的是默认工作线程的handlerThread。mActivityCreatedInfoMap是用于记录activity的启动耗时信息。具体是怎么做到的呢?让我们一步步来看,首先启动EvilMethodTracer的代码如下:

@Override
public void onCreate() {
    super.onCreate();
    if (!getMethodBeat().isRealTrace()) {
        MatrixLog.e(TAG, "MethodBeat don't work, maybe it's wrong in trace Build!");
        onDestroy();
        return;
    }
    if (this.mAnalyseThread == null) {
        this.mAnalyseThread = MatrixHandlerThread.getNewHandlerThread("matrix_trace_analyse_thread");
        mHandler = new Handler(mAnalyseThread.getLooper());
    }
    // set up when onCreate
    mLazyScheduler.cancel();
    if (ApplicationLifeObserver.getInstance().isForeground()) {
        onFront(null);
    }
}

这里先判断了getMethodBeat().isRealTrace(),MethodBeat是统计ANR和超时Method的重要类,可以说是核心类,一会在展开。
mAnalyseThread是利用MatrixHandlerThread新起了一个线程,将并它的looer交给mHandler。EvilMethodTracer重新了doFrame方法:

@Override
public void doFrame(long lastFrameNanos, long frameNanos) {
    //isIgnoreFrame为true的时候,一是出现了ANR,二是正在记录方法的缓存满了
    if (isIgnoreFrame) {
        //清理缓存
        mActivityCreatedInfoMap.clear();
        //改变标识
        setIgnoreFrame(false);
        //重置编号
        getMethodBeat().resetIndex();
        return;
    }
    //index指的是当前执行的方法对应缓存里的编号
    int index = getMethodBeat().getCurIndex();
    //hasEntered说明activity已经存在;
    if (hasEntered && frameNanos - lastFrameNanos > mTraceConfig.getEvilThresholdNano()) {
        MatrixLog.e(TAG, "[doFrame] dropped frame too much! lastIndex:%s index:%s", 0, index);
        //两帧之间相差超过一秒,将缓存里的数据进行分析
        handleBuffer(Type.NORMAL, 0, index - 1, getMethodBeat().getBuffer(), (frameNanos - lastFrameNanos) / Constants.TIME_MILLIS_TO_NANO);
    }
    //重置编号
    getMethodBeat().resetIndex();
    //取消mLazyScheduler,即清空对应的messageQueue
    mLazyScheduler.cancel();
    //发送延迟消息,默认5秒,意味着五秒中之内没有回到doframe会执行mLazyScheduler里的runnable
    mLazyScheduler.setUp(this, false);

}

doFrame时间差超过1秒会执行handleBuffer,时间差超过5秒会执行mLazyScheduler里的onTimeExpire。先来看onTimeExpire方法:

public void onTimeExpire() {
    // 进到这里说明doframe延迟了5秒,视为界面的ANR
    //在后台就忽略
    if (isBackground()) {
        MatrixLog.w(TAG, "[onTimeExpire] pass this time, on Background!");
        return;
    }
    //起始时间
    long happenedAnrTime = getMethodBeat().getCurrentDiffTime();
    MatrixLog.w(TAG, "[onTimeExpire] maybe ANR!");
    //暂停一次Frame捕捉
    setIgnoreFrame(true);
    //缓存里的数据可以清空
    getMethodBeat().lockBuffer(false);
    //分析缓存,Type为ANR
    handleBuffer(Type.ANR, 0, getMethodBeat().getCurIndex() - 1, getMethodBeat().getBuffer(), null, Constants.DEFAULT_ANR, happenedAnrTime, -1);
}

当执行到onTimeExpire这里如果是前台就视为ANR,到目前ANR的捕捉大致知道了,分析数据和doframe一样也是执行handleBuffer,这里也再次提到了getMethodBeat(),getMethodBeat返回的是本地持有的对象sMethodBeat,分析数据的方法handleBuffer也会用到sMethodBeat里的东西,所以分析之前我们先查看下MethodBeat这个类究竟是做什么的,分析MethodBeat之后再来看看handleBuffer。

插曲—MethodBeat

MethodBeat类主要做的事是用一个数组记录应用执行的每个方法和方法耗时,用的是long[],使用方法id和耗时进行按位换算得要一个long值作为元素,缓存默认长度是一百万。在MethodBeat开始有这段代码:

static {
    Hacker.hackSysHandlerCallback();
    sCurrentDiffTime = sLastDiffTime = System.nanoTime() / Constants.TIME_MILLIS_TO_NANO;
    sReleaseBufferHandler.sendEmptyMessageDelayed(RELEASE_BUFFER_MSG_ID, Constants.DEFAULT_RELEASE_BUFFER_DELAY);
}

进入Hacker.hackSysHandlerCallback

public class Hacker {
private static final String TAG = "Matrix.Hacker";
public static boolean isEnterAnimationComplete = false;
public static long sApplicationCreateBeginTime = 0L;
public static int sApplicationCreateBeginMethodIndex = 0;
public static long sApplicationCreateEndTime = 0L;
public static int sApplicationCreateEndMethodIndex = 0;
public static int sApplicationCreateScene = -100;

public static void hackSysHandlerCallback() {
    try {
        sApplicationCreateBeginTime = System.currentTimeMillis();
        sApplicationCreateBeginMethodIndex = MethodBeat.getCurIndex();
        Class<?> forName = Class.forName("android.app.ActivityThread");
        Field field = forName.getDeclaredField("sCurrentActivityThread");
        field.setAccessible(true);
        Object activityThreadValue = field.get(forName);
        Field mH = forName.getDeclaredField("mH");
        mH.setAccessible(true);
        Object handler = mH.get(activityThreadValue);
        Class<?> handlerClass = handler.getClass().getSuperclass();
        Field callbackField = handlerClass.getDeclaredField("mCallback");
        callbackField.setAccessible(true);
        Handler.Callback originalCallback = (Handler.Callback) callbackField.get(handler);
        HackCallback callback = new HackCallback(originalCallback);
        callbackField.set(handler, callback);
        MatrixLog.i(TAG, "hook system handler completed. start:%s", sApplicationCreateBeginTime);
    } catch (Exception e) {
        MatrixLog.e(TAG, "hook system handler err! %s", e.getCause().toString());
    }
}

}
这里利用了反射机制进行了hook,代码比较清晰,目的很明确就是利用自己写的HackCallback来替换ActivityThread类里的mCallback,达到偷梁换柱的效果,这样做的意义就是可以拦截mCallback的原有的消息,然后选择自己要处理的消息,HackCallback里的handleMessage实现如下:

public boolean handleMessage(Message msg) {
    if (msg.what == LAUNCH_ACTIVITY) {
        Hacker.isEnterAnimationComplete = false;
    } else if (msg.what == ENTER_ANIMATION_COMPLETE) {
        Hacker.isEnterAnimationComplete = true;
    }
    if (!isCreated) {
        if (msg.what == LAUNCH_ACTIVITY || msg.what == CREATE_SERVICE || msg.what == RECEIVER) {
            Hacker.sApplicationCreateEndTime = System.currentTimeMillis();
            Hacker.sApplicationCreateEndMethodIndex = MethodBeat.getCurIndex();
            Hacker.sApplicationCreateScene = msg.what;
            isCreated = true;
        }
    }
    if (null == mOriginalCallback) {
        return false;
    }
    return mOriginalCallback.handleMessage(msg);
}

拦截到了LAUNCH_ACTIVITY和ENTER_ANIMATION_COMPLETE消息,这样就知道当前的activity创建到完成的时机。

现在再回到MethodBeat,它继承了IMethodBeat和ApplicationLifeObserver.IObserver接口,IObserver就是之前用于activity生命周期的监听,IMethodBeat是用于方法缓存数据相关的。在看到这里源码时,我就有个疑惑,相信大多数人和我一样会有这样的疑惑:它是如何记录每个方法的,这些方法的执行时间是怎么样计算的?粗略看了一下MethodBeat并没有找到数据的来源,但是我找到了下面这两个个方法:

 //hook method when it's called in
public static void i(int methodId) {
    if (isBackground) {
        return;
    }

    if (!isRealTrace) {
        updateDiffTime();
        sTimeUpdateHandler.sendEmptyMessage(UPDATE_TIME_MSG_ID);
        sBuffer = new long[Constants.BUFFER_TMP_SIZE];
    }
    isRealTrace = true;
    if (isCreated && Thread.currentThread() == sMainThread) {
        if (sIsIn) {
            android.util.Log.e(TAG, "ERROR!!! MethodBeat.i Recursive calls!!!");
            return;
        }

        sIsIn = true;

        if (sIndex >= Constants.BUFFER_SIZE) {
            for (IMethodBeatListener listener : sListeners) {
                listener.pushFullBuffer(0, Constants.BUFFER_SIZE - 1, sBuffer);
            }
            sIndex = 0;
        } else {
            mergeData(methodId, sIndex, true);
        }
        ++sIndex;
        sIsIn = false;
    } else if (!isCreated && Thread.currentThread() == sMainThread && sBuffer != null) {
        if (sIsIn) {
            android.util.Log.e(TAG, "ERROR!!! MethodBeat.i Recursive calls!!!");
            return;
        }

        sIsIn = true;

        if (sIndex < Constants.BUFFER_TMP_SIZE) {
            mergeData(methodId, sIndex, true);
            ++sIndex;
        }

        sIsIn = false;
    }


}

//hook method when it's called out.
public static void o(int methodId) {
    if (isBackground || null == sBuffer) {
        return;
    }
    if (isCreated && Thread.currentThread() == sMainThread) {
        if (sIndex < Constants.BUFFER_SIZE) {
            mergeData(methodId, sIndex, false);
        } else {
            for (IMethodBeatListener listener : sListeners) {
                listener.pushFullBuffer(0, Constants.BUFFER_SIZE - 1, sBuffer);
            }
            sIndex = 0;
        }
        ++sIndex;
    } else if (!isCreated && Thread.currentThread() == sMainThread) {
        if (sIndex < Constants.BUFFER_TMP_SIZE) {
            mergeData(methodId, sIndex, false);
            ++sIndex;
        }
    }
}

这两个方法的注释很值得注意:hook method when it’s called in 和hook method when it’s called out,意思是通过hook来执行的,于是大致猜想它实现记录方法的思路就是在应用运行的每个方法之前调用i(),在每个方法结尾调用方法o(),记录方法名称和o-i的时间差。这两个方法的最后会执行mergeData方法:

 private static void mergeData(int methodId, int index, boolean isIn) {
    long trueId = 0L;
    if (isIn) {
        trueId |= 1L << 63;
    }
    trueId |= (long) methodId << 43;
    trueId |= sCurrentDiffTime & 0x7FFFFFFFFFFL;
    sBuffer[index] = trueId;
}

这里的sCurrentDiffTime就是o()、i()两个方法直接的时间差,和methodId一起保存在long里。
思路有了,它的实现方式呢?既然是用的hook,于是全局搜索了这个class的名称,在matrix-gradle-plugin库里找到了MethodTracer这个类,这个类里找到了如下这段代码:

protected TraceMethodAdapter(int api, MethodVisitor mv, int access, String name, String desc, String className,
                                 boolean hasWindowFocusMethod, boolean isMethodBeatClass) {
        super(api, mv, access, name, desc);
        TraceMethod traceMethod = TraceMethod.create(0, access, className, name, desc);
        this.methodName = traceMethod.getMethodName();
        this.isMethodBeatClass = isMethodBeatClass;
        this.hasWindowFocusMethod = hasWindowFocusMethod;
        this.className = className;
        this.name = name;
    }

    @Override
    protected void onMethodEnter() {
        TraceMethod traceMethod = mCollectedMethodMap.get(methodName);
        if (traceMethod != null) {
            traceMethodCount.incrementAndGet();
            mv.visitLdcInsn(traceMethod.id);
            mv.visitMethodInsn(INVOKESTATIC, TraceBuildConstants.MATRIX_TRACE_CLASS, "i", "(I)V", false);
        }
    }

    @Override
    protected void onMethodExit(int opcode) {
        TraceMethod traceMethod = mCollectedMethodMap.get(methodName);
        if (traceMethod != null) {
            if (hasWindowFocusMethod && mTraceConfig.isActivityOrSubClass(className, mCollectedClassExtendMap)
                    && mCollectedMethodMap.containsKey(traceMethod.getMethodName())) {
                TraceMethod windowFocusChangeMethod = TraceMethod.create(-1, Opcodes.ACC_PUBLIC, className,
                        TraceBuildConstants.MATRIX_TRACE_ON_WINDOW_FOCUS_METHOD, TraceBuildConstants.MATRIX_TRACE_ON_WINDOW_FOCUS_METHOD_ARGS);
                if (windowFocusChangeMethod.equals(traceMethod)) {
                    traceWindowFocusChangeMethod(mv);
                }
            }

            traceMethodCount.incrementAndGet();
            mv.visitLdcInsn(traceMethod.id);
            mv.visitMethodInsn(INVOKESTATIC, TraceBuildConstants.MATRIX_TRACE_CLASS, "o", "(I)V", false);
        }
    }
}

private void traceApplicationContext(MethodVisitor mv, TraceMethod traceMethod) {
    mv.visitVarInsn(Opcodes.ALOAD, 0);
    mv.visitLdcInsn(traceMethod.methodName);
    mv.visitLdcInsn(traceMethod.desc);
    mv.visitMethodInsn(Opcodes.INVOKESTATIC, TraceBuildConstants.MATRIX_TRACE_CLASS, "trace", "(Ljava/lang/Object;Ljava/lang/String;Ljava/lang/String;)V", false);
}

private void traceWindowFocusChangeMethod(MethodVisitor mv) {
    mv.visitVarInsn(Opcodes.ALOAD, 0);
    mv.visitVarInsn(Opcodes.ILOAD, 1);
    mv.visitMethodInsn(Opcodes.INVOKESTATIC, TraceBuildConstants.MATRIX_TRACE_CLASS, "at", "(Landroid/app/Activity;Z)V", false);

}

这段代码是在内部类TraceMethodAdapter里的,这个内部类又继承了AdviceAdapter,AdviceAdapter的实现来自引用的org.wo2.asm下的库,从网上搜了一下,asm是个java字节码操纵框架,它可以直接以二进制形式动态地生成 stub 类或其他代理类,或者在装载时动态地修改类。ASM 提供类似于 BCEL 和 SERP 之类的工具包的功能,但是被设计得更小巧、更快速,这使它适用于实时代码插装。对于matrix来说,意思就是可以利用asm这个框架进行方法的装载,在方法前执行“com/tencent/matrix/trace/core/MethodBeat”这个class里的i()方法,在每个方法最后执行o()方法。顿时觉得涨姿势了,欲知详情可以自己这块的源码。

回到EvilMethodTracer

methodBeat里的方法缓存的来源终于有了,现在回到EvilMethodTracer的handleBuffer分析来,先看方法源码:

private void handleBuffer(Type type, int start, int end, long[] buffer, ViewUtil.ViewInfo viewInfo, long cost, long happenTime, int subType) {
    if (null == buffer) {
        MatrixLog.e(TAG, "null == buffer");
        return;
    }
    if (cost < 0 || cost >= Constants.MAX_EVIL_METHOD_COST) {
        MatrixLog.e(TAG, "[analyse] trace cost invalid:%d", cost);
        return;
    }
    start = Math.max(0, start);
    end = Math.min(buffer.length - 1, end);
    if (start <= end) {
        long[] tmp = new long[end - start + 1];
        System.arraycopy(buffer, start, tmp, 0, end - start + 1);

        if (null != mHandler) {
            AnalyseExtraInfo info = new AnalyseExtraInfo(type, viewInfo, cost, happenTime);
            info.setSubType(subType);
            mHandler.post(new AnalyseTask(tmp, info));
        }
    }
}

这段代码比较简单,意思就是截图有效的那段数据交给mHandler所在的线程来执行,有效的数据即tmp,真正执行的是AnalyseTask的run方法。AnalyseTask的分析方法比较长,大致的思路就是,根据存储里的数据即每个方法id和执行时间,找到时间异常的方法,将异常的方法信息和一些基本信息保存在一个JSONObject里,然后调用plugin的sendReport方法,这里直接截取分析后发布的方法:

try {
            JSONObject jsonObject = new JSONObject();
            jsonObject = DeviceUtil.getDeviceInfo(jsonObject, getPlugin().getApplication());

            jsonObject.put(SharePluginInfo.ISSUE_STACK_TYPE, analyseExtraInfo.type.name());
            jsonObject.put(SharePluginInfo.ISSUE_SUB_TYPE, analyseExtraInfo.subType);
            jsonObject.put(SharePluginInfo.ISSUE_COST, analyseExtraInfo.cost);

            if (analyseExtraInfo.type == Type.ENTER) {
                JSONObject viewInfoJson = new JSONObject();
                ViewUtil.ViewInfo viewInfo = analyseExtraInfo.viewInfo;
                viewInfoJson.put(SharePluginInfo.ISSUE_VIEW_DEEP, null == viewInfo ? 0 : viewInfo.mViewDeep);
                viewInfoJson.put(SharePluginInfo.ISSUE_VIEW_COUNT, null == viewInfo ? 0 : viewInfo.mViewCount);
                viewInfoJson.put(SharePluginInfo.ISSUE_VIEW_ACTIVITY, null == viewInfo ? 0 : viewInfo.mActivityName);
                jsonObject.put(SharePluginInfo.ISSUE_VIEW_INFO, viewInfoJson);
            }
            jsonObject.put(SharePluginInfo.ISSUE_STACK, reportBuilder.toString());
            jsonObject.put(SharePluginInfo.ISSUE_STACK_KEY, key);
            sendReport(jsonObject);

        } catch (JSONException e) {
            MatrixLog.e(TAG, "[JSONException for stack %s, error: %s", reportBuilder.toString(), e);
        }

最终的格式如下:

{
"machine": 2015,
"detail": "ENTER",
"cost": 3205,
"viewInfo": {
    "viewDeep": 10,
    "viewCount": 6,
    "activity": "TestFpsActivity"
},
"stack": "3,195,1,10\n1,33,1,58\n2,206,1,21\n3,161,1,16\n4,180,1,16\n5,169,1,16\n6,96,1,10\n7,98,1,10\n4,183,2,5\n5,211,6,0\n0,30,1,56\n",
"stackKey": "0,30,1,56\n",
"tag": "Trace_EvilMethod",
"process": "sample.tencent.matrix"
}

EvilMethodTracer就先分析到这里,下面来看看StartUpTracer

StartUpTracer

StartUpTracer是用于分析activity的启动时间的,之前在分析MethodBeat时提到了通过hook记录activity在LAUNCH_ACTIVITY和ENTER_ANIMATION_COMPLETE两个时间点,两个时间点在这里便是利用了起来。实现的核心方法在onActivityEntered()里:

String activityName = activity.getComponentName().getClassName();
    if (!mActivityEnteredMap.containsKey(activityName) || isFocus) {
        mActivityEnteredMap.put(activityName, System.currentTimeMillis());
    }
    if (!isFocus) {
        MatrixLog.i(TAG, "[onActivityEntered] isFocus false,activityName:%s", activityName);
        return;
    }

    if (mTraceConfig.isHasSplashActivityName() && activityName.equals(mTraceConfig.getSplashActivityName())) {
        MatrixLog.i(TAG, "[onActivityEntered] has splash activity! %s", mTraceConfig.getSplashActivityName());
        return;
    }

    getMethodBeat().lockBuffer(false);

    long activityEndTime = getValueFromMap(mActivityEnteredMap, activityName);
    long firstActivityStart = getValueFromMap(mFirstActivityMap, mFirstActivityName);
    if (activityEndTime <= 0 || firstActivityStart <= 0) {
        MatrixLog.w(TAG, "[onActivityEntered] error activityCost! [%s:%s]", activityEndTime, firstActivityStart);
        mFirstActivityMap.clear();
        mActivityEnteredMap.clear();
        return;
    }

这是方法的前一段,这里有两个hashmap,mFirstActivityMap记录activity在onCreate时的时间,mActivityEnteredMap记录activity在onActivityEntered时的时间,onActivityEntered这个方法的调用是通过hook实现的在WindowFocusChange执行的,
OnActivityEnter()方法后面的内容是:

long activityCost = activityEndTime - firstActivityStart;
    //sApplicationCreateEndTime是methodBeat通过hook得到的
    long appCreateTime = Hacker.sApplicationCreateEndTime - Hacker.sApplicationCreateBeginTime;
   long betweenCost = firstActivityStart - Hacker.sApplicationCreateEndTime;
    long allCost = activityEndTime - Hacker.sApplicationCreateBeginTime;

    if (isWarnStartUp) {
        betweenCost = 0;
        allCost = activityCost;
    }
    long splashCost = 0;
    if (mTraceConfig.isHasSplashActivityName()) {
        long tmp = getValueFromMap(mActivityEnteredMap, mTraceConfig.getSplashActivityName());

        splashCost = tmp == 0 ? 0 : getValueFromMap(mActivityEnteredMap, activityName) - tmp;
    }
    if (appCreateTime <= 0) {
        MatrixLog.e(TAG, "[onActivityEntered] appCreateTime is wrong! appCreateTime:%s", appCreateTime);
        mFirstActivityMap.clear();
        mActivityEnteredMap.clear();
        return;
    }
    if (mTraceConfig.isHasSplashActivityName() && splashCost < 0) {
        MatrixLog.e(TAG, "splashCost < 0! splashCost:%s", splashCost);
        return;
    }
    EvilMethodTracer tracer = getTracer(EvilMethodTracer.class);
    if (null != tracer) {
        long thresholdMs = isWarnStartUp ? mTraceConfig.getWarmStartUpThresholdMs() : mTraceConfig.getStartUpThresholdMs();
        int startIndex = isWarnStartUp ? mFirstActivityIndex : Hacker.sApplicationCreateBeginMethodIndex;
        int curIndex = getMethodBeat().getCurIndex();
        if (allCost > thresholdMs) {
            MatrixLog.i(TAG, "appCreateTime[%s] is over threshold![%s], dump stack! index[%s:%s]", appCreateTime, thresholdMs, startIndex, curIndex);
            EvilMethodTracer evilMethodTracer = getTracer(EvilMethodTracer.class);
            if (null != evilMethodTracer) {
                evilMethodTracer.handleBuffer(EvilMethodTracer.Type.STARTUP, startIndex, curIndex, MethodBeat.getBuffer(), appCreateTime, Constants.SUBTYPE_STARTUP_APPLICATION);
            }
        }
    }
     MatrixLog.i(TAG, "[onActivityEntered] firstActivity:%s appCreateTime:%dms betweenCost:%dms activityCreate:%dms splashCost:%dms allCost:%sms isWarnStartUp:%b ApplicationCreateScene:%s",
            mFirstActivityName, appCreateTime, betweenCost, activityCost, splashCost, allCost, isWarnStartUp, Hacker.sApplicationCreateScene);

    mHandler.post(new StartUpReportTask(activityName, appCreateTime, activityCost, betweenCost, splashCost, allCost, isWarnStartUp, Hacker.sApplicationCreateScene));

    mFirstActivityMap.clear();
    mActivityEnteredMap.clear();
    isFirstActivityCreate = false;
    mFirstActivityName = null;
    onDestroy();

这段代码统计了application启动耗时、SplashActivity(欢迎页)耗时、应用和activity之间的耗时,统计好的格式如下:

{
"machine": 4,
"application_create": 415,
"first_activity_create": 240,
"stage_between_app_and_activity": 0,
"scene": "com.tencent.mm.app.WeChatSplashActivity",
"is_warm_start_up": false,
"tag": "Trace_StartUp",
"process": "com.tencent.mm",
"time": 1528278018147
}

StartUpTracer先分析到这,还剩下最后一个FPSTracer

FPSTracer

FPSTracer统计的是帧率,统计对应的activity、fragment的掉帧水平,利用的也是Choreographer的doFrame(),我们直接来看它的实现:

@Override
public void doFrame(long lastFrameNanos, long frameNanos) {
    //isInvalid是值是否在前台,isDrawing是否开始绘制
    if (!isInvalid && isDrawing && isEnterAnimationComplete() && mTraceConfig.isTargetScene(getScene())) {
         //分析
        handleDoFrame(lastFrameNanos, frameNanos, getScene());
    }
    isDrawing = false;
}

在开始onDraw开始后,调用了handleDoFrame

private void handleDoFrame(long lastFrameNanos, long frameNanos, String scene) {

    int sceneId;
    if (mSceneToSceneIdMap.containsKey(scene)) {
        sceneId = mSceneToSceneIdMap.get(scene);
    } else {
        //记录界面的名称和界面的编号
        sceneId = mSceneToSceneIdMap.size() + 1;
        mSceneToSceneIdMap.put(scene, sceneId);
        mSceneIdToSceneMap.put(sceneId, scene);
    }
    int trueId = 0x0;
    trueId |= sceneId;
    trueId = trueId << 22;
    //计算此帧的耗时
    long offset = frameNanos - lastFrameNanos;
    trueId |= ((offset / FACTOR) & 0x3FFFFF);
    //超过5秒
    if (offset >= 5 * 1000000000L) {
        MatrixLog.w(TAG, "[handleDoFrame] WARNING drop frame! offset:%s scene%s", offset, scene);
    }
    synchronized (this.getClass()) {
        //记录
        mFrameDataList.add(trueId);
    }
}

在这里只是做了记录,内部的定时器mLazyScheduler在onCreate是启动,时间间隔默认是120秒,

 public void onTimeExpire() {
    doReport();
}

调用doReport()

private void doReport() {
    LinkedList<Integer> reportList;
    synchronized (this.getClass()) {
        if (mFrameDataList.isEmpty()) {
            return;
        }
        reportList = mFrameDataList;
        mFrameDataList = new LinkedList<>();
    }
    //reportList里的元素包含了sceneId和帧耗时
    for (int trueId : reportList) {
        int scene = trueId >> 22;
        int durTime = trueId & 0x3FFFFF;
        LinkedList<Integer> list = mPendingReportSet.get(scene);
        if (null == list) {
            list = new LinkedList<>();
            mPendingReportSet.put(scene, list);
        }
        list.add(durTime);
    }
    reportList.clear();
    //mPendingReportSet里取到了scene和其对应的帧内容
    for (int i = 0; i < mPendingReportSet.size(); i++) {
        int key = mPendingReportSet.keyAt(i);
        LinkedList<Integer> list = mPendingReportSet.get(key);
        if (null == list) {
            continue;
        }
        int sumTime = 0;
        int markIndex = 0;
        int count = 0;

        int[] dropLevel = new int[DropStatus.values().length]; // record the level of frames dropped each time
        int[] dropSum = new int[DropStatus.values().length]; // record the sum of frames dropped each time
        int refreshRate = (int) Constants.DEFAULT_DEVICE_REFRESH_RATE * OFFSET_TO_MS;
        for (Integer period : list) {
            sumTime += period;
            count++;
            int tmp = period / refreshRate - 1;
            if (tmp >= Constants.DEFAULT_DROPPED_FROZEN) {
                dropLevel[DropStatus.DROPPED_FROZEN.index]++;
                dropSum[DropStatus.DROPPED_FROZEN.index] += tmp;
            } else if (tmp >= Constants.DEFAULT_DROPPED_HIGH) {
                dropLevel[DropStatus.DROPPED_HIGH.index]++;
                dropSum[DropStatus.DROPPED_HIGH.index] += tmp;
            } else if (tmp >= Constants.DEFAULT_DROPPED_MIDDLE) {
                dropLevel[DropStatus.DROPPED_MIDDLE.index]++;
                dropSum[DropStatus.DROPPED_MIDDLE.index] += tmp;
            } else if (tmp >= Constants.DEFAULT_DROPPED_NORMAL) {
                dropLevel[DropStatus.DROPPED_NORMAL.index]++;
                dropSum[DropStatus.DROPPED_NORMAL.index] += tmp;
            } else {
                dropLevel[DropStatus.DROPPED_BEST.index]++;
                dropSum[DropStatus.DROPPED_BEST.index] += (tmp < 0 ? 0 : tmp);
            }

            if (sumTime >= mTraceConfig.getTimeSliceMs() * OFFSET_TO_MS) { // if it reaches report time
                float fps = Math.min(60.f, 1000.f * OFFSET_TO_MS * (count - markIndex) / sumTime);
                MatrixLog.i(TAG, "scene:%s fps:%s sumTime:%s [%s:%s]", mSceneIdToSceneMap.get(key), fps, sumTime, count, markIndex);
                try {
                    JSONObject dropLevelObject = new JSONObject();
                    dropLevelObject.put(DropStatus.DROPPED_FROZEN.name(), dropLevel[DropStatus.DROPPED_FROZEN.index]);
                    dropLevelObject.put(DropStatus.DROPPED_HIGH.name(), dropLevel[DropStatus.DROPPED_HIGH.index]);
                    dropLevelObject.put(DropStatus.DROPPED_MIDDLE.name(), dropLevel[DropStatus.DROPPED_MIDDLE.index]);
                    dropLevelObject.put(DropStatus.DROPPED_NORMAL.name(), dropLevel[DropStatus.DROPPED_NORMAL.index]);
                    dropLevelObject.put(DropStatus.DROPPED_BEST.name(), dropLevel[DropStatus.DROPPED_BEST.index]);

                    JSONObject dropSumObject = new JSONObject();
                    dropSumObject.put(DropStatus.DROPPED_FROZEN.name(), dropSum[DropStatus.DROPPED_FROZEN.index]);
                    dropSumObject.put(DropStatus.DROPPED_HIGH.name(), dropSum[DropStatus.DROPPED_HIGH.index]);
                    dropSumObject.put(DropStatus.DROPPED_MIDDLE.name(), dropSum[DropStatus.DROPPED_MIDDLE.index]);
                    dropSumObject.put(DropStatus.DROPPED_NORMAL.name(), dropSum[DropStatus.DROPPED_NORMAL.index]);
                    dropSumObject.put(DropStatus.DROPPED_BEST.name(), dropSum[DropStatus.DROPPED_BEST.index]);

                    JSONObject resultObject = new JSONObject();
                    resultObject = DeviceUtil.getDeviceInfo(resultObject, getPlugin().getApplication());

                    resultObject.put(SharePluginInfo.ISSUE_SCENE, mSceneIdToSceneMap.get(key));
                    resultObject.put(SharePluginInfo.ISSUE_DROP_LEVEL, dropLevelObject);
                    resultObject.put(SharePluginInfo.ISSUE_DROP_SUM, dropSumObject);
                    resultObject.put(SharePluginInfo.ISSUE_FPS, fps);
                    sendReport(resultObject);
                } catch (JSONException e) {
                    MatrixLog.e(TAG, "json error", e);
                }


                dropLevel = new int[DropStatus.values().length];
                dropSum = new int[DropStatus.values().length];
                markIndex = count;
                sumTime = 0;
            }
        }

        // delete has reported data
        if (markIndex > 0) {
            for (int index = 0; index < markIndex; index++) {
                list.removeFirst();
            }
        }

        if (!list.isEmpty()) {
            MatrixLog.d(TAG, "[doReport] sumTime:[%sms < %sms], scene:[%s]", sumTime / OFFSET_TO_MS, mTraceConfig.getTimeSliceMs(), mSceneIdToSceneMap.get(key));
        }
    }
}

最终解析出来的就是activity对应的帧耗时数据,数据格式如下:

{
"machine": 2015,
"scene": "sample.tencent.matrix.trace.TestFpsActivity",
"dropLevel": {
    "DROPPED_HIGH": 4,
    "DROPPED_MIDDLE": 12,
    "DROPPED_NORMAL": 18,
    "DROPPED_BEST": 113
},
"dropSum": {
    "DROPPED_HIGH": 60,
    "DROPPED_MIDDLE": 96,
    "DROPPED_NORMAL": 51,
    "DROPPED_BEST": 6
},
"fps": 24.476625442504883,
"tag": "Trace_FPS",
"process": "sample.tencent.matrix"
}

目前tracePlugin的内容分析完了,其中有些数据的计算没有去展开,大家可以自己查看。

tracePlugin总结

这里主要分析得失TracePlugin的实现,其中包括了ANR记录、超时函数记录、帧数统计和启动记录,这还只是Plugins中的一个,内容已经显得有点长了,所以我决定后面的IOCanaryPluginSQLiteLintPluginResourcePlugin都分成不同文章来分析。

自己才疏学浅,肯定有很多不足的地方,有遗漏或错误的地方欢迎指正。

欢迎转载,注明出处即可,后面的文章还在书写中。

坚持原创技术分享,您的支持将鼓励我继续创作!