What is Grafana?
What is Kibana?
Want advice about which of these to choose?Ask the StackShare community!
We recently implemented Thanos alongside Prometheus into our Kubernetes clusters, we had previously used a variety of different metrics systems and we wanted to make life simpler for everyone by just picking one.
Prometheus seemed like an obvious choice due to its powerful querying language, native Kubernetes support and great community. However we found it somewhat lacking when it came to being highly available, something that would be very important if we wanted this to be the single source of all our metrics.
Thanos came along and solved a lot of these problems. It allowed us to run multiple Prometheis without duplicating metrics, query multiple Prometheus clusters at once, and easily back up data and then query it. Now we have a single place to go if you want to view metrics across all our clusters, with many layers of redundancy to make sure this monitoring solution is as reliable and resilient as we could reasonably make it.
If you're interested in a bit more detail feel free to check out the blog I wrote on the subject that's linked.
Why we spent several years building an open source, large-scale metrics alerting system, M3, built for Prometheus:
By late 2014, all services, infrastructure, and servers at Uber emitted metrics to a Graphite stack that stored them using the Whisper file format in a sharded Carbon cluster. We used Grafana for dashboarding and Nagios for alerting, issuing Graphite threshold checks via source-controlled scripts. While this worked for a while, expanding the Carbon cluster required a manual resharding process and, due to lack of replication, any single node’s disk failure caused permanent loss of its associated metrics. In short, this solution was not able to meet our needs as the company continued to grow.
To ensure the scalability of Uber’s metrics backend, we decided to build out a system that provided fault tolerant metrics ingestion, storage, and querying as a managed platform...
(GitHub : https://github.com/m3db/m3)
analyze heap dump and many logging or traces
We use Grafana to view live stats relating to our servers such as memory and CPU usage. We also use Grafana to monitor our gaming servers for data such as latency and player counts. This allows us to generate effective analytics and see when problems arise.
Everyone likes graphs, right?! This isn't a tool we actively use right now, but paired with Prometheus we want to use it to have visual monitors on things like API cluster health, status, queue stats, DB/redis query and cache stats etc.
We primarily use Prometheus to gather metrics and statistics to display them in Grafana. Aside from that we poll Prometheus for our orchestration-solution "JCOverseer" to determine, which host is least occupied at the moment.
Grafana is used in combination with Prometheus to display the gathered stats and to monitor our physical servers aswell as their virtual applications. We also use Grafana to get notifications about irregularities.
Grafana takes the data from InfluxDB and presents it in a nice flexible format. Bonus points for built-in alerts and playlists (cycles through different dashboards automatically)
- Graph report with many panels and Dashboard.
- Easy to deploy, and view performance of system.
- Intergrating with many datasource: Prometheus, CloudWatch
Used for graphing internal logging data; including metrics related to how fast we serve pages and execute MySQL/ElasticSearch queries.
Our Kibana instances uses our ElasticSearch search data to help answer any complicated questions we have about our data.
Gather metrics from systems and applications. Evaluate alerting rules. Alerts are pushed to OpsGenie and Slack.
Kibana is our tools to query data in Elasticsearch clusters set up as catalog search engine.
We primarily use Prometheus to gather metrics and statistics to display them in Grafana.