How does a company go about keeping track of millions of customer devices?
To combat this issue Fitbit’s core back end team – called Device Cornerstone – launched the Factory Upload project. It’s goal? To create a continuous streaming pipeline of manufactured Fitbit data, all the way from the factory to our main servers.
At its conclusion, Factory Upload will give Fitbit a more authoritative record of manufactured Fitbits, making the customer experience more reliable and consistent by helping identify tracker properties, such as special edition trackers. It will also assist the company in regulatory compliance.
In this post, we will provide an overview of the project architecture, reflections on design decisions made, and thoughts about the project from a new grad perspective.
Architecting a Solution
When it comes to software development, one quintessential question is that of build versus buy. We looked into many off the shelf data streaming solutions, including Maxwell, Debezium, and Kafka Connect. All three of these libraries are very impressive solutions in their own right, and using them would free up resources that we would otherwise have to dedicate to building and maintaining an in-house solution. Furthermore, we would have some assurance that these projects, having already been used by other major companies, were vetted for quality.
In the end, and after much discussion, we decided to reuse a prior in-house solution and simply build on top of that. External libraries absolutely do have their benefits, but we also realized we would able to get significant time savings by reusing our code (as opposed to rewriting the whole thing from scratch), and weighed that heavily in our decision making. Furthermore, because our in-house solution had been battle-tested in production before, we felt we were able to get quality assurances anyways.
We ended up creating a Kafka producer that was essentially a recurring Aurora job written in Python. Every time the job ran, it would serialize and stream new Fitbit data over a Kafka topic, then store a record of where it finished (so it would know where to pick up on future runs). On the receiving end of the topic, we created a Kafka consumer that was connected to our main servers and would continuously ingest and persist that data.
I’ll take an aside here to mention that Apache Kafka is both an excellent data streaming platform and a key technology that makes all of this possible. By using Kafka as the backbone of our project, we were able to abstract out the concepts of guaranteed delivery and capacity, saving us a substantial amount of time and effort. The design decision of choosing the right technology sometimes happens well before any code is ever written. By making the right choices, we were able to set ourselves up for success.
Keeping Things Modular
In the time that I’ve been here, I feel that my ability to write good code has improved substantially, but it’s clear that I still have a long way to go. One design decision that I wish I had been more attentive to from the beginning was designing my code to be more modular. I think part of the reason why this happens, especially to new grads, is because the mantra of writing clean code is something that you never really get exposed to in college.
I’ve read books on coding style, and I think I even have a decent understanding of the basic principles of clean code. But I only gave passing thought about designing my code to be modular from the ground up. I added a small function there, and another, and another, until they all piled up into one big ugly class. Even little things can snowball into big problems.
I realized when I came back to maintain my code weeks later that just trying to find what I was looking for was visually distracting and difficult. Another engineer on my team noticed this as well. When it came time to add more features, I was determined to split them off into their own Python modules. When we added monitoring, for example, I refactored the functionality into a dedicated Graphite module, and simply plugged it into my main code. We are currently in the process of migrating our factory databases to a new host, which required us to rethink how we handled credentials. I used this as an opportunity to split off that code into a dedicated Puppet module as well. Even though many of these modules were small, with some being less than 100 lines of code long, I found that there were immediate benefits in terms of maintainability.
Refactoring in the Face of New Business Requirements
While Factory Upload is intended to benefit all Fitbit devices, one of the challenges that we ran into concerned Aria 2, which we discovered had a schema for its factory data that was different than what we originally designed for. These new requirements were not compatible with our Kafka consumer at the time (a form of tech debt, perhaps). We could have simply slapped a band-aid on our code and written some logic branches that special cased our consumer (e.g. for Aria 2, do this instead of that…), but that would have been prone to side effects, and we would most certainly pay for it down the line. Instead, we took this opportunity to refactor our code using a strategy design pattern.
We associated every Fitbit product with a series of strategies on how to process their factory records. So when we started adding code to make our consumer work for Aria 2’s special schemas, we simply created new strategies specific to Aria 2, and plugged them into our code. These strategies had the benefit of being clear and concise, and thus, easily testable. This, in turn, increased my confidence in pushing the code to production, and in the long run, improved my productivity. Furthermore, the strategies that we wrote were both modular and quite reusable, and we anticipate that by putting in a little bit of effort now, we will have saved ourselves a lot more time in the long run.
As a new grad, it was very exciting to work with the new technology that I worked with. I was able to get the best of both worlds on this project.
It was independent enough that I was able to build many parts of it from the ground up. Ultimately, I was allowed to make my own calls on design decisions, and I took responsibility for the upsides and the shortcomings. I was not only able to get a comprehensive overview of some really fascinating tech, including Kafka and Aurora, I was also lucky enough to be able to get broad exposure to tools across the tech stack, such as Puppet (credentials management), Docker (used to deploy to Aurora), and Graphite (metrics collection).
The project was also collaborative enough that I was able to leverage Fitbit’s substantial engineering resources for help. I was able to work with other engineers by going through a design review process at Fitbit. It was there that I got helpful advice on how to best architect my project. I was also able to get help from our App Platform, Monitoring, and Operations teams. All in all, I am quite grateful to be able to get these benefits from collaboration.
There is still some work to be done, but for the most part we are simply moving this project to maintenance mode. Establishing this pipeline will now allow us to improve customer security. It was both an educational and an impactful project, and it would not have been possible for me to do this without the support of my colleagues, especially those on my own team.
Device Cornerstone is a team at Fitbit that handles core back end engineering. In my time here, I have worked with incredibly talented engineers, created impactful projects that run at scale, and grown substantially as a programmer. If you are interested in working in back end engineering (or just at Fitbit in general), please consider applying to our software engineering positions on our jobs board.
About the Author
Michael Wang is a software engineer on the Device Cornerstone team. He is passionate about solving difficult problems, and constantly finding ways to improve the end user experience. He also enjoys creating a fun and collaborative work environment, and is always open to learning new things. He has been with the company since 2016.