We’ve talked to Russell Stephens from Compass about how his team cut down the time spent on testing from 80 hours to 4, and shortened their release cycles to one week. Read on to find out what steps they took to achieve this!
Compass is an American real estate technology company providing software to real estate agents. Through their industry-leading platform, they are changing how agents and clients navigate the process of finding or selling a home. Compass has created the first modern real estate application, pairing the industry’s top talent with technology to make the entire experience of finding a home intelligent and seamless.
If you want to listen to how their mobile team accelerated their CI/CD processes, what the milestones were during their journey, and the exciting projects they’re working on, click here for the full episode:
The following is an excerpt from our podcast episode with Russell Stephens, Mobile Infrastructure Lead at Compass.
Bitrise: You've gone through quite an evolution when it comes to your CI/CD, from an on-premise Jenkins then transitioning to the cloud, and ultimately, to Bitrise. What kind of impact has this had on your processes and performance?
Russell: When I started at the company, there was a stack of Mac minis on someone else's desk. When I came into work one day, someone had spilled either a cup of coffee or a glass of water on the whole stack. Our CI was running when I went to see why none of the computers would turn on anymore, so we entered this journey that ultimately landed us here talking today. Early on, we looked at cloud-based options and ultimately landed with CircleCI, and we ran with them for a while. What we were able to do was run tests on PRs, run the actual upload flow to the App Store through there, but it was very limited — we were still kind of figuring out what mobile infrastructure is and what Mobile DevOps is, all of that. Around 2018, when we stumbled upon Bitrise and saw the ease of use of Workflow Editor, the different projects, and the different tabs, we wanted to see what we can do with it.
We've been impressed that when you go there to drop in new functionality, there's always a little bit more available, so over the years, we've been leveraging CI. If you're doing something manually, you should ultimately be able to automate it. Things we've automated over the past year are: the entire release pipeline, including code signing, uploading to the App Store.
We started with around 5%, unit test coverage and eventually raised that up to 30 to 50. Our team has around 86% code coverage since then, and we’re looking to raise the bar there. We also did a heavy dive on UI automation and how we can leverage that to exercise what we call customer workflows, how the feature vertical teams integrate across each other. We're adding automation on top of that to verify how the feature teams come together and how the ultimate customer deliverable works.
We had about 80 hours of manual regression cycles that we used to run. We initially had four to five manual QA people, as we were working on getting the app out the door. At first, this wasn't a big deal, because we were building a release maybe every month, every two months, and then we moved to two weeks. But when we tried to go down to one week, it was something that was just not possible to do manually.
We introduced automation and slowly raised our automation percentage to sit around 80% over the past year. How did we really do that? I would say that there's been a lot of effort coming up with the testing infrastructure side and that's something that we've moved over to the mobile infra team. We’ve established patterns — we're using something called the ‘robot pattern’, which is a scalable way to say ‘this is how you can interface with my screen’ — then the different feature teams use those patterns to build automation tests.
As we’ve automated the backlog of all the test cases, the amount of manual testing we've had to do and the amount of time it takes to get the app up the door have steadily decreased. Over the past year, we were actually able to decrease it from 80 hours to just under four hours, by leveraging all these automation flows.
Obviously, there are challenges. Developers might think “UI testing is not stable, it's flaky, it fails all the time. You can't do it.” It's just one of those things where you always hear the reasons why you can't. But once you pop up in the hood, you use automation, you just start with an assumption where you turn it into a concrete use case — something that either works or it doesn't. If it's not working, you know, we're all engineers, we roll up our sleeves, and we got it to the point where it is today.
Without leveraging the automation here, I don't think it would have been possible to maintain a two-week release cycle, let alone actually getting it down to a one-week consistent release cycle, where we are fully regressing everything in the app. We're also getting new feature teams and expanding our feature sets, growing and scaling the team, we're actually still maintaining a decline in our regression.
Bitrise: Prior to automating everything, what was your release cycle like?
Russell: It was very ad hoc, very much like “Who wants to build the release? Who has the keys on their laptop? Who can sign it?” That graduated more towards this bi-weekly cadence with QA engineers working with us to verify for regression.
The advantage wasn’t just the continued ability to ship new features — we actually use it and we call it agility. When things go wrong, you also want to be able to get the app out the door and not make things worse. Let’s say you were to have a production incident, you want to fix it and get it out the door as fast as possible. Getting that down to the four-hour regression cycle that we have now means that if we have an issue that happens overnight, we can fix it and triage it in the morning, and most likely ship it to Apple for review in the same day. So it's partly to manage the occasional time where you might find a flaw and you want to override it with an update. That is part of the reason for the one week cycle.
Compass is growing and scaling, there's always more to do, there are more teams popping up, left and right. Let’s say your features are ready mid-sprint, or something you've been working on for three months is now ready to launch — there's always a release that's going out the door. It's just become very regimented. We took a process that was very painful, and actually made it quite, I don't want to say, pleasurable, some may want to say not painful.
Automation is the key here. Without automation, your hands are tied, knowing how to go in and manually rescue things, mapping things as they change. If it's all automated, it's just a simple refactor, a simple button, or a bug fix. And that frees up our developers to just focus on the next big thing that they can chase confidently — knowing that the things they've already worked on are not breaking.
Bitrise: How did the transition from Jenkins to Bitrise affect the day-to-day work of your mobile team?
Russell: On that perspective, having run a homegrown Jenkins, CI/CD pipeline in the past, once you know that that magical opportunity where ‘our boxes no longer work’ happened, I was very eager to not have that be part of my day to day, just because of the known instabilities.
As a mobile developer, I don't want to be maintaining the CI/CD, pipeline hardware, and software updates. I've seen this situation where you're building something on Jenkins, and maybe there's the famous pop up that says it's time to update Java, and you have no idea why Jenkins is not finishing your build, you turn on the TV, you find the keyboard for this box you probably keep in the corner, and you're like Java, let's update this right now.
With Bitrise, we know that it's going to be stable, we know that what you see is what you get, which is great because it frees up our team. I would say it’s turbocharged how we leverage the CI/CD in our day-to-day.