Modern web applications are built upon a stack of interconnected services. They allow us to build quickly and try ideas by using pre-built services for a fraction of the upfront cost of building it ourselves.
The price we pay for this fantastic reduction in time to market is that the health of every app or website is now dependent on the health of a dozen other services. Why, then, is there no easy way to see the health of all those services?
That was the realization I had one Wednesday afternoon a few months ago. I had just spent two hours debugging a problem on a customer's web app involving the Facebook Platform API. After two hours, I Googled for the Facebook API Status page. BAM! There it was. Facebook was reporting a known issue. There was nothing I could do.
I spent the next 48 hours building what I call a Minimum Triable Product: it was a single Sidekiq worker and a collection of Ruby scrapers using Mechanize, each scraping the published service status pages of the services that mattered to me. For example, Github's. I would aggregate these statuses to a single dashboard and send myself notifications when they changed.
There was no signup, no name or logo, not even a web interface at all. My goal was simply to try it myself: prove that the concept could work and would save me time and money.
I chose Heroku to try out my idea because it was simple and free to start. I unleashed the workers and within a month, I was already saving time and reducing headaches by knowing in advance when services had downtime.
Next, I talked to every developer I knew. Near universal praise for the concept and a flurry of ideas came out of my "MTP". What I had built wasn't a viable product... yet. But it had already been tried and proven useful by a jury of my peers. Now the next step would be to build something viable - the minimum product someone might actually pay for.
Over the next several weeks, I built most of what composes StatusGator. I chose Rails not only because of my almost 10 year love affair with it, but because of the plethora of Ruby Gems which would help me build fast. Devise allowed me to get sign up and authentication done in a matter of minutes. I prefer the syntax of HAML and Sass, because it saves me precious keystrokes in Vim thanks to Tim Pope's plugins.
For a database, I chose Heroku's default Postgres. But not all data lends itself to relational databases. So Redis would form the heart of the StatusGator stack: simple string values store the current status of every service. These are updated by workers every 5 minutes. Redis sorted sets store all the time series data. A complete history of every service's downtime along with each user's unique service subscriptions and history are all stored as Redis sorted sets. To host this, I chose Redis Cloud because of their low entry price point.
I knew historical data would be important, if for no other reason than to help improve the service. When a status page changes, StatusGator saves the page HTML and a screenshot to Amazon S3. The Fog gem makes it very easy and with thousands of events logged, the monthly Amazon bill is still less than $5 per month.
Some services provide APIs for fetching status information, but most do not. For those, scrapers were employed to instantly analyze the logged data and compare known historical data about the page to its current content. Some status pages have complicated unique scrapers. Others, use easy-to-parse pages from services such as StatusPage.io or Stashboard. As status pages are scraped, each service is bucketed into one of 4 statuses: "up", "warn", "down", and "unknown". When a service posts a warning such as "degraded performance", the scraper detects this and records a "warn" status. The two levels of service statuses allow users to fine tune their notifications preferences.
With Stripe and their Checkout product, payment processing was completed in half a day. Integrations with Slack, Flowdock, and HipChat would come eventually, but notifications started simple: emails using SendGrid because their Heroku integration was simple and free. And Twilio because their REST API is so easy to use.
The rest was icing: I built the front end with Bootstrap because it's responsive by default. And everything was wired up with Codeship, because I am obsessed with frequent deployment. I setup a domain name from DNSimple because their UI is brilliant and their SSL cert ordering is so easy.
The resulting app allows anyone to build their own internal status dashboard. Users can share this dashboard with their team and setup notifications so they can be alerted when services post downtime.:
The stack that powers StatusGator has continued to evolve. Our blog is built with Ghost and hosted on DigitalOcean because of its much lower price point for SSL hosting compared to Heroku. Android and iOS apps are in the works using Ionic, a very fast way to build multiplatform mobile apps. The frontend now uses Angular because it can consume the Rails JSON API very easily. Slack, Flowdock, HipChat, and even webhook integrations are now a centerpiece of the service, allowing users to keep their teams up to date on the status of their stack. And the mountains of feature requests, service suggestions, and integration ideas received from users are all logged to Trello.
But the core StatusGator stack remains the same: A collection of tools that allowed the idea to be tested very quickly, for very little money. The tools, services, and hosts will undoubtedly change as StatusGator evolves and scales, but that's the beauty of assembling and reassembling your own stack from today's cloud services.
At least now there is an app to tell you if all those services are working.