We decided to move the provisioning process to an API-driven process, and had to decide among a few implementation languages:
We built prototypes in both languages, and decided on NodeJS:
Getting into the headspace and internalizing the assumptions of a tool helps pick the right one. NodeJS assumes services will be non-blocking/event-driven and HTTP-accessible, which snapped into our scenario perfectly. The new NodeJS architecture resulted in a staggering 95% reduction in processing time: requests went from 7.5 seconds to under a second.
The original API performed a synchronous Nginx reload after provisioning a zone, which often took up to 30 seconds or longer. While important, this step shouldn’t block the response to the user (or API) that a new zone has been created, or block subsequent requests to adjust the zone. With the new API, an independent worker reloads Nginx configurations based on zone modifications.It’s like ordering a product online: don’t pause the purchase process until the product’s been shipped. Say the order has been created, and you can still cancel or modify shipping information. Meanwhile, the remaining steps are being handled behind the scenes. In our case, the zone provision happens instantly, and you can see the result in your control panel or API. Behind the scenes, the zone will be serving traffic within a minute.
How do you know what parts of the workflow need improvement? Measure it. With New Relic in place, we have graphs of our API performance and can directly see if a server or zone is causing trouble, and the impact of our changes. There’s no comparison between a real-time performance graph and “Strange, the site seems slow, I should tail the logs”.