Business Context:
Flo Health has always been a data-driven company. All of our significant decisions are backed with some form of analytics, like experiments, surveys, machine learning models, etc. In addition to being able to acquire and hold leadership positions in the women’s health market, this decision-making process has to be quick and precise.
A few years ago, Flo Health was a relatively small company with 200 employees, 3.5 million daily active users, and around 18 million daily analytical events. The data platform at that time had a simple architecture and was completely self-served. The only work necessary to gather some insights was just to run a couple of SQL queries and create trivial visualizations, and maintenance efforts and costs were quite low.
Fortunately, our business ideas were successful, and we were able to become market leaders, which resulted in rapid growth. By the start of 2022, we will have 400 employees, 250% more daily active users, and the demand for data platform usage has also significantly increased (1,700% more analytical events). Looking into the future, the growth trend looks exponential, so we have to ensure that we’ll be able to serve it without time-to-market for decisions increasing.
Rapid growth always leads to data platform complications, and we found ourselves in a situation where employees were spending more time performing data analysis.
In essence, two of the most common ways to deal with this problem are either to invest in massive hiring of specialized engineers (which is quite hard due to trade market limitations) or use an automated self-served data platform infrastructure to lower cognitive barriers and hide complexity inside it (preferable for use because it’s faster and more scalable in the future).
So our team’s primary goal for 2022 is to hide this complexity from people and serve them the simplest toolset possible.
Methodological solution:
We wanted to answer a couple of questions that we suspected might help us make the right decision in terms of choosing a centric idea of organization for a new, improved data platform:
- Should it be centralized or decentralized?
- Who will be responsible for what?
- What is the success criteria?
Because we had successfully resolved this type of problem before with application infrastructure by moving from monolithic architecture to microservices, and since organizational structure is already product-team-centric, we tried to figure out something similar for the data platform. And fortunately, this type of architecture already exists. It’s called data mesh — here’s a wonderful article describing it. So the next step for us was to choose a suitable technical solution to implement this architecture.
Technical solution:
Before considering solutions, one important question has to be answered: Should we look for a vendor or in-house solution? Each of these options has known pros and cons:
Vendor pros:
- Typically significantly less time to adopt
- Lower maintenance efforts
- Often has faster delivery of new solution features delivery
- No product work needed
- Compliance to standards like ISO and HIPAA is supported by design out of the box
Vendor cons:
- Not really flexible in terms of fitting company-specific requirements
- High annual costs
- Sometimes hard to contribute to solutions from the company side
- In-house pros are basically vendor cons and vice versa.
Based on that knowledge, we decided to go with a vendor solution. To make our decision, we prepared some acceptance criteria, the most important of which were:
- Migration period of less than 1 year
- Unified platform for all data related operations, starting from data ingestion and finishing by BI and ML
- ISO and HIPAA compliance
- Good vendor reputation and solution improvement pace
- Dedicated on-demand infrastructure for teams and simplified approach for working with data
- Reduces maintenance efforts as much as possible
- Cloud native
From the solutions we considered, Databricks was the only one that satisfied all of the above criteria and required moderate migration efforts and monetary investment. So we picked it and started adoption.