Colin Hemmings, Product Manager at Bitrise introduces Trace, the new application performance monitoring (APM) solution we are building here at Bitrise. Trace is monitoring made specifically for mobile, helping you catch bugs before they reach your users. With Trace, you’ll be able to have a complete view of these issues with full context, so that you can assess, reproduce, and fix them as quickly as possible.
What is monitoring?
What is monitoring? This is one of the questions we posed ourselves when we first looked at building a monitoring solution here at Bitrise. The term is very broad and means different things to different people. There's Real User Monitoring (RUM), crash reporting, analytics, tracing, log monitoring, the list goes on. There are a lot of great solutions across all these different areas of monitoring, so it can be a minefield trying to work out which one is best for your team.
Why build monitoring?
Why would we build our own monitoring solution when there are already so many great solutions out there? There's Crashytics, New Relic, Sentry, Bugsnag, Instabug, Datadog, the list goes on. It's a crowded market, but also extremely fragmented. These solutions are all strong in specific areas, be it basic crash reporting, server performance monitoring, server crash reporting or giant monoliths trying to be all things to all people.
As an organisation, we have always been focused on mobile-first, on improving the lives of mobile developers, so they can build better quality apps. To do that we are constantly looking at the outcomes mobile development teams are looking to achieve.
What do users want from monitoring?
So what do our users want from monitoring? Well, nothing! What they want is to build great products, to make their users happy, to deliver value and to iterate quickly. Monitoring should be there to help facilitate these goals, but it is never the objective.
With all these outcomes in mind, we decided to tackle the critical areas of mobile monitoring:
- Detect the problem: Know about issues before your users report them.
- Assess the impact: Focus on resolving the issues which are most impactful to your users.
- Trace the cause: Spend less time trying to reproduce issues.
Detection, whilst in the simple case this is showing users when there is a crash or when a new app version is consuming too much memory on the device. Problems can be a lot more nuanced than that. For example, if there was a bug only affecting one specific version of the Android API on a single type of Android device (the image below is a visualisation of the roughly 24k different Android devices available in 2015), with thousands of different Android devices across many versions of the API, it could take days of continuous work to find that outlier.
Impact assessment, when you have confirmed the problem only affecting a small number of devices, do you prioritise that issue lower than others? We aim to show for how many users, across which geographies, devices and app versions an issue is occurring. That way you can swiftly prioritise the most critical problems for your domain.
Tracing the root cause to reproduce the problem, can often be inferred simply by looking at the stack trace of a crash report. But, on many occasions knowing the line of code isn't enough. You need to reproduce the problem. But on which device? Which OS version? Which journey did the user take? Did they have a network connection? Due to the sheer volume on device permeations, this can take days or weeks.
What is Trace?
We are about to release the beta of Trace on iOS (with Android soon to follow), which is our monitoring solution of mobile DevOps teams. The initial release will include support for crash reporting, performance monitoring and user sessions, but there is a whole lot more to come.
We are releasing the basic components first so that we can quickly iterate based on user feedback and enhance the app to build a product focus on your objectives. That being said, I still think you will love the initial features we are going to release.
With our performance monitoring, you can see how your app is performing in the areas of:
- Stability: How many error and crashes is my app generating
- Responsiveness: How long does my app take to start or transition between views?
- Consumption: How much CPU and Memory resource is your app consuming?
We show you all this data in one dashboard and allow you to filter ("What if I only want to see data for iPhone X devices?") and compare ("How does it compare to the previous versions?") across different session types.
Notice an issue with startup time? Explore further and drill down into which types of sessions are having the problem and when it started. This will drastically reduce the problem space and make locating the root cause a lot easier.
A vital part of monitoring any mobile app is crash reporting, so we've not neglected that. We automatically capture crash reports generated by your app for you to see in Trace. We will group them by type so that you see all the different issues and which versions they are affecting.
Inspect the details of a crash to see a full "human-readable" stack trace, running thread state, plus all information about the device on which it occurred.
Now that you see how your app is performing and any issues we have detected, you can take it one step further and see exactly what happens during the user's session as a timeline of events. Video is expensive, limiting and invasive, so we display a graphical timeline of session events which is much better to understand exactly how a user produces a carsh, without imposing on their privacy.
Just add the Trace step to your Bitrise workflow and off you go! We will take care of integrating the Trace SDK into your app and collecting the important metrics required to understand the performance of your app.