Compare Azure HDInsight to these popular alternatives based on real-world usage and developer feedback.

Dremio—the data lake engine, operationalizes your data lake storage and speeds your analytics processes with a high-performance and high-efficiency query engine while also democratizing data access for data scientists and analysts.

It is an analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources—at scale. It brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate BI and machine learning needs.

Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

Run super-fast, SQL-like queries against terabytes of data in seconds, using the processing power of Google's infrastructure. Load data with ease. Bulk load your data using Google Cloud Storage or stream it in. Easy access. Access BigQuery by using a browser tool, a command-line tool, or by making calls to the BigQuery REST API with client libraries such as Java, PHP or Python.

It is optimized for data sets ranging from a few hundred gigabytes to a petabyte or more and costs less than $1,000 per terabyte per year, a tenth the cost of most traditional data warehousing solutions.

Snowflake eliminates the administration and management demands of traditional data warehouses and big data platforms. Snowflake is a true data warehouse as a service running on Amazon Web Services (AWS)—no infrastructure to manage and no knobs to turn.

It provides the leading platform for Operational Intelligence. Customers use it to search, monitor, analyze and visualize machine data.

It is used in a variety of applications, including log analysis, data warehousing, machine learning, financial analysis, scientific simulation, and bioinformatics.

Apache Flink is an open source system for fast and versatile data analytics in clusters. Flink supports batch and streaming analytics, in one system. Analytical programs can be written in concise and elegant APIs in Java and Scala.

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

Hive facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage.

A fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics.

Distributed SQL Query Engine for Big Data

Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments. Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations.

It is an open source software integration platform helps you in effortlessly turning data into business insights. It uses native code generation that lets you run your data pipelines seamlessly across all cloud providers and get optimized performance on all platforms.

It is a service designed to allow developers to integrate disparate data sources. It is a platform somewhat like SSIS in the cloud to manage the data you have both on-prem and in the cloud.

Stitch is a simple, powerful ETL service built for software developers. Stitch evolved out of RJMetrics, a widely used business intelligence platform. When RJMetrics was acquired by Magento in 2016, Stitch was launched as its own company.

Impala is a modern, open source, MPP SQL query engine for Apache Hadoop. Impala is shipped by Cloudera, MapR, and Amazon. With Impala, you can query data, whether stored in HDFS or Apache HBase – including SELECT, JOIN, and aggregate functions – in real time.

Its mission is to connect the world’s applications, data and devices. It makes connecting anything easy with Anypoint Platform™, the only complete integration platform for SaaS, SOA and APIs. Thousands of organizations in 60 countries, from emerging brands to Global 500 enterprises, use it to innovate faster and gain competitive advantage.

Cloudera Enterprise includes CDH, the world’s most popular open source Hadoop-based platform, as well as advanced system management and data management tools plus dedicated support and community advocacy from our world-class team of Hadoop developers and experts.

It helps you centralize data from disparate sources which you can manage directly from your browser. We extract your data and load it into your data destination.

It is an open-source data integration platform that syncs data from applications, APIs & databases to data warehouses lakes & DBs.

An open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads.

With Redshift Spectrum, you can extend the analytic power of Amazon Redshift beyond data stored on local disks in your data warehouse to query vast amounts of unstructured data in your Amazon S3 “data lake” -- without having to load or transform any data.

It is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.

It provides a best-in-class, unified analytics platform that will forever be independent from underlying infrastructure.

A new addition to the open source Apache Hadoop ecosystem, Kudu completes Hadoop's storage layer to enable fast analytics on fast data.

It is an open-source alternative to BigQuery, Redshift, and Snowflake. It is a wrapper around Clickhouse that lets you input arbitrary JSON and perform analytical queries against it. It automatically creates tables and columns when new data is added.

Apache Kylin™ is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop/Spark supporting extremely large datasets, originally contributed from eBay Inc.

Pig is a dataflow programming environment for processing very large files. Pig's language is called Pig Latin. A Pig Latin program consists of a directed acyclic graph where each node represents an operation that transforms data. Operations are of two flavors: (1) relational-algebra style operations such as join, filter, project; (2) functional-programming style operators such as map, reduce.

It is open source and lets regular users import their big data, query it, search it, visualize it and build dashboards on top of it, all from their browser.

An end-to-end data integration platform to build, run, monitor and manage smart data pipelines that deliver continuous data for DataOps.

It is a modern, browser-based UI, with powerful, push-down ETL/ELT functionality. With a fast setup, you are up and running in minutes.

Cask Data Application Platform (CDAP) is an open source application development platform for the Hadoop ecosystem that provides developers with data and application virtualization to accelerate application development, address a broader range of real-time and batch use cases, and deploy applications into production while satisfying enterprise requirements.

Qubole is a cloud based service that makes big data easy for analysts and data engineers.

It is a fast distributed SQL query engine for big data analytics that helps you explore your data universe. It is designed to query large data sets distributed over one or more heterogeneous data sources.

It is a managed Spark and Hadoop service that lets you take advantage of open source data tools for batch processing, querying, streaming, and machine learning. It helps you create clusters quickly, manage them easily, and save money by turning clusters off when you don't need them.

It is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.

Treasure Data's Big Data as-a-Service cloud platform enables data-driven businesses to focus their precious development resources on their applications, not on mundane, time-consuming integration and operational tasks. The Treasure Data Cloud Data Warehouse service offers an affordable, quick-to-implement and easy-to-use big data option that does not require specialized IT resources, making big data analytics available to the mass market.

A fully managed, cloud-native data integration service that helps users efficiently build and manage ETL/ELT data pipelines. With a graphical interface and a broad open-source library of preconfigured connectors and transformations, and more.

Its Virtual Data Warehouse delivers performance, security and agility to exceed the demands of modern-day operational analytics.

Pachyderm is an open source MapReduce engine that uses Docker containers for distributed computations.

Get the power of big data in minutes with Alooma and Amazon Redshift. Simply build your pipelines and map your events using Alooma’s friendly mapping interface. Query, analyze, visualize, and predict now.

It syncs your data warehouse with CRM & go-to-market tools. Get your customer success, sales & marketing teams on the same page by sharing the same customer data.

Singer powers data extraction and consolidation for all of your organization’s tools: advertising platforms, web analytics, payment processors, email service providers, marketing automation, databases, and more.

It is a no-code data pipeline as a service. Start moving data from any source to your data warehouses such as Redshift, BigQuery, and Snowflake in real-time.

It is an Intelligent Platform that Interoperates with Your Data Investments. It sits between the data storage and processing environments and the visualization, statistical or machine learning tools used downstream

It is a Hitachi Group Company, data integration and business analytics company with an enterprise, Online Analytical Processing server (OLAP). Allows business users to analyze large and complex amounts of data in real-time.

It is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

It is the leading Reverse ETL platform. Sync customer data from your warehouse into tools your business teams rely on.