What is AppDynamics?
What is AppSignal?
Need advice about which tool to choose?Ask the StackShare community!
Sign up to add, upvote and see more prosMake informed product decisions
What are the cons of using AppSignal?
Sign up to add, upvote and see more consMake informed product decisions
Sign up to get full access to all the companiesMake informed product decisions
Sign up to get full access to all the tool integrationsMake informed product decisions
We recently implemented Thanos alongside Prometheus into our Kubernetes clusters, we had previously used a variety of different metrics systems and we wanted to make life simpler for everyone by just picking one.
Prometheus seemed like an obvious choice due to its powerful querying language, native Kubernetes support and great community. However we found it somewhat lacking when it came to being highly available, something that would be very important if we wanted this to be the single source of all our metrics.
Thanos came along and solved a lot of these problems. It allowed us to run multiple Prometheis without duplicating metrics, query multiple Prometheus clusters at once, and easily back up data and then query it. Now we have a single place to go if you want to view metrics across all our clusters, with many layers of redundancy to make sure this monitoring solution is as reliable and resilient as we could reasonably make it.
If you're interested in a bit more detail feel free to check out the blog I wrote on the subject that's linked.
Why we spent several years building an open source, large-scale metrics alerting system, M3, built for Prometheus:
By late 2014, all services, infrastructure, and servers at Uber emitted metrics to a Graphite stack that stored them using the Whisper file format in a sharded Carbon cluster. We used Grafana for dashboarding and Nagios for alerting, issuing Graphite threshold checks via source-controlled scripts. While this worked for a while, expanding the Carbon cluster required a manual resharding process and, due to lack of replication, any single node’s disk failure caused permanent loss of its associated metrics. In short, this solution was not able to meet our needs as the company continued to grow.
To ensure the scalability of Uber’s metrics backend, we decided to build out a system that provided fault tolerant metrics ingestion, storage, and querying as a managed platform...
(GitHub : https://github.com/m3db/m3)
Regarding Continuous Integration - we've started with something very easy to set up - CircleCI , but with time we're adding more & more complex pipelines - we use Jenkins to configure & run those. It's much more effort, but at some point we had to pay for the flexibility we expected. Our source code version control is Git (which probably doesn't require a rationale these days) and we keep repos in GitHub - since the very beginning & we never considered moving out. Our primary monitoring these days is in New Relic (Ruby & SPA apps) and AppSignal (Elixir apps) - we're considering unifying it in New Relic , but this will require some improvements in Elixir app observability. For error reporting we use Sentry (a very popular choice in this class) & we collect our distributed logs using Logentries (to avoid semi-manual handling here).
We have Prometheus as a monitoring engine as a part of our stack which contains Kubernetes cluster, container images and other open source tools. Also, I am aware that Sysdig can be integrated with Prometheus but I really wanted to know whether Sysdig or sysdig+prometheus will make better monitoring solution.
We primarily use Prometheus to gather metrics and statistics to display them in Grafana. Aside from that we poll Prometheus for our orchestration-solution "JCOverseer" to determine, which host is least occupied at the moment.
AppDynamics is used to monitor our full end to end user experience journey's including the infrastructure health.
Gather metrics from systems and applications. Evaluate alerting rules. Alerts are pushed to OpsGenie and Slack.
We primarily use Prometheus to gather metrics and statistics to display them in Grafana.
APM for our web apps, and rolling out of the End User Experience Monitoring