Guest blog by Rémy Chantenay, lead of engineering at Travelex. The original post appeared on Travelex Tech Blog.
From a software engineering (mobile oriented) background, Rémy Chanteney recently transitioned to lead of engineering at Travelex. Passionate about everything related to mobile, cloud infrastructure and disruptive technologies. With products like Wire or Travelex Money and other B2B projects, Travelex digital is striving to innovate in the international money transfer ecosystem.
You might know Travelex as currency exchange company and will probably have seen one of our bureau de change outlets — we are present in a vast majority of airports around the world.
Our digital team strive to tackle technological challenges across a variety of products. As with any businesses, the technology choices we make eventually become outdated and need re-evaluation.
Trying to fill the features gap to keep up with the competition leads to engineering challenges and technical debt that accumulates over months as you stretch an app to fit new features.
With that in mind, it was time for our team to take a step back, pinpoint what was slowing us down and spend some time investigating the options to improve our workflow. A retrospective needs to be seen as an investment, just like upgrading an engine before the next race.
Having experienced frequent build failures, we acknowledged some issues with our integration and delivery process and decided to challenge the tools used by our teams.
Firstly, we had to ask ourselves what was wrong with the way we were currently doing integration and delivery and secondly, how we could improve it.
Like lots of companies, we were still using the somewhat outdated Jenkins (raise your hand if you think that Jenkins is a CI from a past era).
Yes, it’s highly configurable and powerful, but definitely over-complicated for what we want to achieve with the Travelex apps. Not to mention mentioning the horrible UI (we’ve come a long way since 2011), it’s far from straightforward.
On top of that, we were hosting our CI server internally for enhanced security reasons (OK, remotely accessible, but still…).
To summarise, the flip side is that we were losing a lot of time maintaining or fixing it (and I mean, easily several hours per week) due to our relatively complex process involving Nexus and usage of VPN.
To put this in context, we have 5 different flavors for the Supercard app and 4 for Travelex Money. What does that mean?
In the case of Supercard, for every build triggered by Jenkins, 5 different APK’s were created.
We were using Nexus Repository to store these 5 artifacts, so they can be picked up by any QA’s or developers later on. Once Jenkins finished building the different variants, a Gradle publish task was executed to deploy these APK’s on Nexus.
Deployment Through VPN
To make things more complicated, the only way to deploy to our Nexus repository was through a VPN (again, security reasons).
To summarise, imagine deploying roughly 5 x 20Mb from a local hosted Jenkins server to a dicey Nexus repository, though a VPN, for every build. It seemed pretty clear at this point that the process could be simplified.
Adding to that the fact that we had to occasionally update the dependencies and build tools, the process was leading to a lot of failure.
The complexity of the overall process definitely contributed to the failures (as you can see below):
- A developer creates a pull request on GitHub
- Another developer merges the pull request — a build is triggered on Jenkins
- The different artefacts are deployed on Nexus
- The QA’s could access the artifacts from Nexus
- Remote QA’s had to access to Nexus through a VPN
Once we knew what the problem was, we needed to figure how to simplify everything and define the requirements, without compromising the security aspect.
Sine Qua Non
There are obviously mandatory specifications, regardless of the final decision:
- Very little to no maintenance
- Straightforward and easy to set up
- Pattern matching triggers (push, PR, …)
- Slack integration
- Scheduled builds for nightlies
- GitHub pull request status check support to run the tests
Local vs Cloud
Nowadays, looking for a new CI solution without considering cloud-based offers is impossible. Therefore, the subscription fee and the level of control on the build environment are new concerns added to the list above.
While the cost of the subscription fee isn’t that much of a problem here, we wanted our CI to be customisable and flexible. I won’t tease you any longer, we decided to go ahead with Bitrise.
When comparing different alternatives, we were impressed by the multiple features Bitrise offers, in addition to its user-friendly interface.
The highly customisable build workflows and the freedom of running limitless custom scripts (bash, ruby, golang, you name it) made this a no-brainer.
They are using the now notorious Intercom live chat solution. It’s smooth, frictionless and feels very personal (at least for the few times we needed it).
Any question? Just tap the floating action button in the bottom right corner of your screen from anywhere in your dashboard.
The quality of support is easy to appreciate and feature requests are another pleasant surprise you will discover further down the road. The Bitrise team let you create suggestions or comment and vote for already existing ones.
Now, will your ideas be added to their roadmap? That’s a different story, but you can speak your mind, and that’s important.
Their documentation is literally off the charts — clear, complete and easy to navigate. They recently introduced a brand new DevCenter and guess what? It’s open source. The least we can say is that they have a strong sense of community.
Owners, Admins, Developers, QA/Testers — the different roles provided by Bitrise are a big plus, allowing you to limit rights for users who don’t need full access. Unfortunately, there is no way to create custom roles yet, which means that if you don’t find what you need in one of the roles above, you will most likely end up giving Admin rights to everyone, which isn’t ideal.
The project workflow configuration is user-friendly, graphically represented as a timeline. You can add, delete or even move steps in a simple drag and drop interface.
Bear in mind that once you save your workflow, a YAML file is generated. Of course, if you feel like it, you can directly edit your workflow from this file.
This also means that if you have project A and you want to start project B with a very similar workflow, you just have to copy/paste the content of the *.yml file.
As you can see on the diagram above, the process is much simpler now — the magic now happens on GitHub and Bitrise, period.
That’s what we call trimming the fat, isn’t it?
While Bitrise provides a specific workflow step to sign and release the app on different stores, we chose to not change this step of our process yet. We have some concerns about giving access to the certificates/keystores to third-parties, therefore, we are still signing and delivering our apps manually (for now).
We are currently doing due diligence and evaluating the security risks and consequences to do this final step through Bitrise as well.
Room For Improvement
For now, Bitrise is not perfect — even though they provide all the key features and everything runs smoothly, some nice-to-haves are definitely missing.
(Editor's note: this feature has since been implemented.)
Now that we ditched some security layers (locally hosted Jenkins, mandatory VPN for Nexus), Bitrise clearly became the Achilles heel. This may or may not be an issue, depending on how your team is using it.
The way Bitrise works is the same as GitHub — you create a personal account and the owner of the company account will attach the organization to your personal account, giving you some rights.
While being very convenient, the flip side is that the vulnerability of your projects depends on how conscientious your team members are. As we all know there are 1001 ways for a password to be compromised, the two-step authentication à la GitHub would significantly improve the security of the entire organization projects.
At Travelex, every developer has to enable the two-factor authentication of their GitHub account — it’s optional on GitHub, but mandatory within the company. For consistency, we would like to apply the same rule for Bitrise.
Hopefully, this option will land sooner or later on Bitrise, meanwhile the best we can do is be very rigorous with the rights that we grant to different users — a simple developer or QA shouldn’t need admin rights.
Lack of Mobile App
It’s 2017, mobiles are running the world. Their web version is fairly responsive, but the experience isn’t great. It’s probably not a priority but I personally would like to be able to start a build or read the logs on the go.
Anyhow, it’s still possible to trigger builds directly from Slack using the Outgoing Webhooks API.
If you needed more evidence that idle time is crucial for a team, you got it.
Our engineers are happy with the new process, QA’s and developers can easily get access to the artifacts from anywhere, without setting up a VPN on their mobile devices. In case of failure, you don’t need to be a Jenkins Guru to read the log and figure out what the problem is.
The cherry on the cake— engineers can now focus on their work and not on fixing something that is supposed to work continuously.
However, from a security point of view, we had to make concessions but we definitely gained a lot in efficiency and understanding — a classic tradeoff.