Need advice about which tool to choose?Ask the StackShare community!
Grafana vs InfluxDB vs Prometheus: What are the differences?
Introduction
Grafana, InfluxDB, and Prometheus are popular tools used in the monitoring and observability of software systems. While they have similar functionalities, there are key differences that set them apart.
Data Storage and Retrieval: InfluxDB is a time-series database designed specifically for handling time-stamped data. It stores data efficiently in a compressed format and allows fast retrieval using time-based queries. Grafana, on the other hand, is a visualization tool and does not provide data storage capabilities. Prometheus is a specialized time-series database that collects data through a pull mechanism, making it suitable for monitoring dynamic environments.
Data Aggregation and Processing: InfluxDB supports advanced data processing capabilities such as downsampling, retention policies, and continuous queries, enabling efficient analysis and aggregation of time-series data. Grafana, being a visualization tool, does not have built-in data processing capabilities. Prometheus uses its own query language called PromQL, which allows flexible data aggregation and processing, including label-based selection and mathematical operations.
Alerting and Notification: Grafana provides comprehensive alerting capabilities, allowing users to define threshold-based rules and receive notifications through various channels like email, Slack, and PagerDuty. InfluxDB lacks native alerting functionality but can be integrated with other tools or Grafana to achieve alerting. Prometheus has a powerful alerting system integrated into its core, enabling users to define complex alert rules and send notifications through various channels.
Scalability and Clustering: InfluxDB provides clustering capabilities through its Enterprise Edition, allowing multiple InfluxDB nodes to work together in a high-availability setup. Grafana, as a visualization tool, does not require scaling or clustering as it primarily fetches data from other data sources. Prometheus supports horizontal scalability by using a federation mechanism where multiple Prometheus instances can be connected together to aggregate and query data.
Metrics Collection: Grafana focuses on visualization and relies on data sources like InfluxDB and Prometheus to collect metrics. InfluxDB provides a native Telegraf agent for collecting metrics from various sources and sending them to InfluxDB. Prometheus has its own Prometheus server that collects metrics from instrumented applications using client libraries and exporters.
Community and Ecosystem: Grafana has a large and active community of users, with a rich ecosystem of plugins and integrations supporting various data sources. InfluxDB has a smaller but growing community, with a focus on time-series use cases and a growing number of integrations. Prometheus has a vibrant community with a wide range of exporters and integrations, making it a popular choice for monitoring and alerting in Kubernetes environments.
In summary, Grafana is a powerful visualization tool, InfluxDB is a scalable time-series database, and Prometheus is a comprehensive monitoring and alerting system. Each tool has its own strengths and focuses on different aspects of the monitoring and observability stack.
Looking for a tool which can be used for mainly dashboard purposes, but here are the main requirements:
- Must be able to get custom data from AS400,
- Able to display automation test results,
- System monitoring / Nginx API,
- Able to get data from 3rd parties DB.
Grafana is almost solving all the problems, except AS400 and no database to get automation test results.
You can look out for Prometheus Instrumentation (https://prometheus.io/docs/practices/instrumentation/) Client Library available in various languages https://prometheus.io/docs/instrumenting/clientlibs/ to create the custom metric you need for AS4000 and then Grafana can query the newly instrumented metric to show on the dashboard.
Hi, We have a situation, where we are using Prometheus to get system metrics from PCF (Pivotal Cloud Foundry) platform. We send that as time-series data to Cortex via a Prometheus server and built a dashboard using Grafana. There is another pipeline where we need to read metrics from a Linux server using Metricbeat, CPU, memory, and Disk. That will be sent to Elasticsearch and Grafana will pull and show the data in a dashboard.
Is it OK to use Metricbeat for Linux server or can we use Prometheus?
What is the difference in system metrics sent by Metricbeat and Prometheus node exporters?
Regards, Sunil.
If you're already using Prometheus for your system metrics, then it seems like standing up Elasticsearch just for Linux host monitoring is excessive. The node_exporter is probably sufficient if you'e looking for standard system metrics.
Another thing to consider is that Metricbeat / ELK use a push model for metrics delivery, whereas Prometheus pulls metrics from each node it is monitoring. Depending on how you manage your network security, opting for one solution over two may make things simpler.
Hi Sunil! Unfortunately, I don´t have much experience with Metricbeat so I can´t advise on the diffs with Prometheus...for Linux server, I encourage you to use Prometheus node exporter and for PCF, I would recommend using the instana tile (https://www.instana.com/supported-technologies/pivotal-cloud-foundry/). Let me know if you have further questions! Regards Jose
We are building an IOT service with heavy write throughput and fewer reads (we need downsampling records). We prefer to have good reliability when comes to data and prefer to have data retention based on policies.
So, we are looking for what is the best underlying DB for ingesting a lot of data and do queries easily
We had a similar challenge. We started with DynamoDB, Timescale, and even InfluxDB and Mongo - to eventually settle with PostgreSQL. Assuming the inbound data pipeline in queued (for example, Kinesis/Kafka -> S3 -> and some Lambda functions), PostgreSQL gave us a We had a similar challenge. We started with DynamoDB, Timescale and even InfluxDB and Mongo - to eventually settle with PostgreSQL. Assuming the inbound data pipeline in queued (for example, Kinesis/Kafka -> S3 -> and some Lambda functions), PostgreSQL gave us better performance by far.
Druid is amazing for this use case and is a cloud-native solution that can be deployed on any cloud infrastructure or on Kubernetes. - Easy to scale horizontally - Column Oriented Database - SQL to query data - Streaming and Batch Ingestion - Native search indexes It has feature to work as TimeSeriesDB, Datawarehouse, and has Time-optimized partitioning.
if you want to find a serverless solution with capability of a lot of storage and SQL kind of capability then google bigquery is the best solution for that.
So, I am working in a big company where they have multiple different microservices running that are written in Golang. I am currently searching for a technology that can give me all the metric data from the microservices. What time-series databases would you recommend? or which databases would you recommend to further investigate? I appreciate any input.
Each of these tools can help you with micro service workload and work well. I will try to go through some good, bad and ugly of each.
Datadog has an easy setup and time to get something tangible out of it. The cost model is by host so this is something to take into consideration how it will affect your use case. Also as a large organization at some point you will probably want control over some/all of your telemetry data to run your own ML or AI processes. With Datadog you this can be difficult as you will need to create processes outside of its closed eco system to get Raw metrics.
Prometheus is a great tool. It also has a fairly straight forward setup especially with Kubernetes. If you are running your micro services in k8s then this is going to get used one way or another; it is a first class citizen there with heavy utilization of K8s API. I also like the fact that Kubernetes architecture is easy to understand and that it utilizes Grafana for the visualization engine. Prometheus at scale can be done but it is a pain. Especially with a distributed infrastructure across multiple workloads.
Influxdb (TICK stack in v1) is known for its scalability and flexibility as a time series database. Telegraf is the main input/data-forwarder of the architecture and is completely decoupled from the database as are the other 3 components of the stack. Influx has made it very easy to just use one component on its own. I have worked on stacks that just used telegraf for ingestion into Kinesis or another data stream. I have also worked on stacks that used Influx database but used a different ETL process for analyzing the data in realtime instead of using their v1 architectures Kapacitor query engine. Influx database is a great performing time series database that in version 2 runs within kubernetes and utilizes Flux as the query language. Flux is a nice query language that is fairly easy to learn and has a lot of flexibility. As a last positive note Telegraf is written in Go so that would fit well with your current team.
The difficulties of Influx are that it is hard to get something really tangible out of it. Initial time to see something is fast but all the other work involved is a lot. You also have to understand the architecture well. The management of Influx can be cumbersome but it can scale up better than the other two when Datadogs cost is taken into consideration. They have a lot of API hooks in their V1 enterprise edition to wire and configure it. They do offer a mange service to offload this cost until later.
My overall choice here is probably to go with some of the influx as you can rip and/or add components as needed into the flow. Eventually you will probably want to run an ML process within there (can be done within Kapacitor but of course can also use your cloud provider here too) and this gives you the flexibility to do it anywhere. I would still go through prometheus because you will most likely use it also, but it does have forwarders to Influxdb so still fits.
We're running Prometheus/Alertmanager/Grafana across our whole company for any monitoring and metrics requirement, from the infrastructure layer all the way up to Springboot endpoint services, the prometheus exporter / scraping approach works pretty well for us. It's really easy to setup and more importantly; to maintain it without much effort, all the Prometheus configs get automatically created through Terraform outputs and Ansible jobs. Combine it with Grafana and you're smiling.
We're moving towards Prometheus from Datadog at this moment. Main driving force is TOC at the moment.
Datadog is great until it becomes too expensive.
We're looking for a Monitoring and Logging tool. It has to support AWS (mostly 100% serverless, Lambdas, SNS, SQS, API GW, CloudFront, Autora, etc.), as well as Azure and GCP (for now mostly used as pure IaaS, with a lot of cognitive services, and mostly managed DB). Hopefully, something not as expensive as Datadog or New relic, as our SRE team could support the tool inhouse. At the moment, we primarily use CloudWatch for AWS and Pandora for most on-prem.
I worked with Datadog at least one year and my position is that commercial tools like Datadog are the best option to consolidate and analyze your metrics. Obviously, if you can't pay the tool, the best free options are the mix of Prometheus with their Alert Manager and Grafana to visualize (that are complementary not substitutable). But I think that no use a good tool it's finally more expensive that use a not really good implementation of free tools and you will pay also to maintain its.
this is quite affordable and provides what you seem to be looking for. you can see a whole thing about the APM space here https://www.apmexperts.com/observability/ranking-the-observability-offerings/
From a StackShare Community member: “We need better analytics & insights into our Elasticsearch cluster. Grafana, which ships with advanced support for Elasticsearch, looks great but isn’t officially supported/endorsed by Elastic. Kibana, on the other hand, is made and supported by Elastic. I’m wondering what people suggest in this situation."
For our Predictive Analytics platform, we have used both Grafana and Kibana
- Grafana based demo video: https://www.youtube.com/watch?v=tdTB2AcU4Sg
- Kibana based reporting screenshot: https://imgur.com/vuVvZKN
Kibana has predictions
and ML algorithms support, so if you need them, you may be better off with Kibana . The multi-variate analysis features it provide are very unique (not available in Grafana).
For everything else, definitely Grafana . Especially the number of supported data sources, and plugins clearly makes Grafana a winner (in just visualization and reporting sense). Creating your own plugin is also very easy. The top pros of Grafana (which it does better than Kibana ) are:
- Creating and organizing visualization panels
- Templating the panels on dashboards for repetetive tasks
- Realtime monitoring, filtering of charts based on conditions and variables
- Export / Import in JSON format (that allows you to version and save your dashboard as part of git)
I use both Kibana and Grafana on my workplace: Kibana for logging and Grafana for monitoring. Since you already work with Elasticsearch, I think Kibana is the safest choice in terms of ease of use and variety of messages it can manage, while Grafana has still (in my opinion) a strong link to metrics
After looking for a way to monitor or at least get a better overview of our infrastructure, we found out that Grafana (which I previously only used in ELK stacks) has a plugin available to fully integrate with Amazon CloudWatch . Which makes it way better for our use-case than the offer of the different competitors (most of them are even paid). There is also a CloudFlare plugin available, the platform we use to serve our DNS requests. Although we are a big fan of https://smashing.github.io/ (previously dashing), for now we are starting with Grafana .
I use Kibana because it ships with the ELK stack. I don't find it as powerful as Splunk however it is light years above grepping through log files. We previously used Grafana but found it to be annoying to maintain a separate tool outside of the ELK stack. We were able to get everything we needed from Kibana.
Kibana should be sufficient in this architecture for decent analytics, if stronger metrics is needed then combine with Grafana. Datadog also offers nice overview but there's no need for it in this case unless you need more monitoring and alerting (and more technicalities).
@Kibana, of course, because @Grafana looks like amateur sort of solution, crammed with query builder grouping aggregates, but in essence, as recommended by CERN - KIbana is the corporate (startup vectored) decision.
Furthermore, @Kibana comes with complexity adhering ELK stack, whereas @InfluxDB + @Grafana & co. recently have become sophisticated development conglomerate instead of advancing towards a understandable installation step by step inheritance.
I chose TimescaleDB because to be the backend system of our production monitoring system. We needed to be able to keep track of multiple high cardinality dimensions.
The drawbacks of this decision are our monitoring system is a bit more ad hoc than it used to (New Relic Insights)
We are combining this with Grafana for display and Telegraf for data collection
Pros of Grafana
- Beautiful89
- Graphs are interactive68
- Free57
- Easy56
- Nicer than the Graphite web interface34
- Many integrations26
- Can build dashboards18
- Easy to specify time window10
- Can collaborate on dashboards10
- Dashboards contain number tiles9
- Open Source5
- Integration with InfluxDB5
- Click and drag to zoom in5
- Authentification and users management4
- Threshold limits in graphs4
- Alerts3
- It is open to cloud watch and many database3
- Simple and native support to Prometheus3
- Great community support2
- You can use this for development to check memcache2
- You can visualize real time data to put alerts2
- Grapsh as code0
- Plugin visualizationa0
Pros of InfluxDB
- Time-series data analysis59
- Easy setup, no dependencies30
- Fast, scalable & open source24
- Open source21
- Real-time analytics20
- Continuous Query support6
- Easy Query Language5
- HTTP API4
- Out-of-the-box, automatic Retention Policy4
- Offers Enterprise version1
- Free Open Source version1
Pros of Prometheus
- Powerful easy to use monitoring47
- Flexible query language38
- Dimensional data model32
- Alerts27
- Active and responsive community23
- Extensive integrations22
- Easy to setup19
- Beautiful Model and Query language12
- Easy to extend7
- Nice6
- Written in Go3
- Good for experimentation2
- Easy for monitoring1
Sign up to add or upvote prosMake informed product decisions
Cons of Grafana
- No interactive query builder1
Cons of InfluxDB
- Instability4
- Proprietary query language1
- HA or Clustering is only in paid version1
Cons of Prometheus
- Just for metrics12
- Bad UI6
- Needs monitoring to access metrics endpoints6
- Not easy to configure and use4
- Supports only active agents3
- Written in Go2
- TLS is quite difficult to understand2
- Requires multiple applications and tools2
- Single point of failure1