Apache Drill vs Google BigQuery

Get Advice Icon

Need advice about which tool to choose?Ask the StackShare community!

Apache Drill

72
171
+ 1
16
Google BigQuery

1.7K
1.5K
+ 1
152
Add tool

Apache Drill vs Google BigQuery: What are the differences?

Introduction

Apache Drill and Google BigQuery are both powerful data analysis tools that provide developers with the ability to query and analyze large datasets. While they have similar goals, there are several key differences between Apache Drill and Google BigQuery that make each unique.

  1. Flexibility and Data Source Support: Apache Drill offers more flexibility and supports a wider range of data sources compared to Google BigQuery. Apache Drill can efficiently query structured and semi-structured data stored in various formats such as JSON, Parquet, Avro, and more. On the other hand, Google BigQuery is primarily designed for structured data stored in Google Cloud Storage or Google Drive.

  2. Cost Structure: The cost structure of Apache Drill and Google BigQuery differs significantly. Apache Drill is an open-source project that can be freely downloaded, installed, and used without incurring any additional charges. In contrast, Google BigQuery is part of the Google Cloud Platform and has a usage-based pricing model. Users are charged based on the amount of data processed and storage used.

  3. Scalability: While both Apache Drill and Google BigQuery can handle large volumes of data, the underlying architecture and scalability options differ. Apache Drill leverages the distributed computing power of Apache Hadoop to scale horizontally and process data in parallel across a cluster. Google BigQuery, on the other hand, is a fully managed service that automatically scales to handle massive datasets without requiring manual configuration or infrastructure management.

  4. Query Language Support: Apache Drill supports SQL queries, making it easy for developers familiar with SQL to interact with the data. In addition, Apache Drill also provides support for complex nested data structures through its SQL-based query language. Google BigQuery, on the other hand, uses a proprietary query language called BigQuery SQL, which is similar to SQL but has some additional syntax and features.

  5. Integration with Ecosystem: Apache Drill integrates well with the Apache Hadoop ecosystem and can leverage other tools such as Apache Hive, Apache HBase, and more. This allows developers to easily combine the capabilities of these tools with Apache Drill for efficient data analysis. Google BigQuery, on the other hand, is tightly integrated with other Google Cloud Platform services, providing seamless integration with storage, compute, and analytics services offered by Google.

  6. Performance Optimization: Apache Drill provides developers with fine-grained control over query execution and optimization, allowing them to tune performance according to their specific requirements. Google BigQuery, being a fully managed service, automatically optimizes query execution behind the scenes. While this may simplify query optimization for users, it limits the level of control developers have over the performance tuning process.

In summary, Apache Drill provides more flexibility in terms of data source support, offers a cost advantage as an open-source project, and has better integration with the Apache Hadoop ecosystem. On the other hand, Google BigQuery is tightly integrated with Google Cloud Platform services, automatically scales to handle large datasets, and offers a simplified query optimization process.

Decisions about Apache Drill and Google BigQuery
Julien Lafont

Cloud Data-warehouse is the centerpiece of modern Data platform. The choice of the most suitable solution is therefore fundamental.

Our benchmark was conducted over BigQuery and Snowflake. These solutions seem to match our goals but they have very different approaches.

BigQuery is notably the only 100% serverless cloud data-warehouse, which requires absolutely NO maintenance: no re-clustering, no compression, no index optimization, no storage management, no performance management. Snowflake requires to set up (paid) reclustering processes, to manage the performance allocated to each profile, etc. We can also mention Redshift, which we have eliminated because this technology requires even more ops operation.

BigQuery can therefore be set up with almost zero cost of human resources. Its on-demand pricing is particularly adapted to small workloads. 0 cost when the solution is not used, only pay for the query you're running. But quickly the use of slots (with monthly or per-minute commitment) will drastically reduce the cost of use. We've reduced by 10 the cost of our nightly batches by using flex slots.

Finally, a major advantage of BigQuery is its almost perfect integration with Google Cloud Platform services: Cloud functions, Dataflow, Data Studio, etc.

BigQuery is still evolving very quickly. The next milestone, BigQuery Omni, will allow to run queries over data stored in an external Cloud platform (Amazon S3 for example). It will be a major breakthrough in the history of cloud data-warehouses. Omni will compensate a weakness of BigQuery: transferring data in near real time from S3 to BQ is not easy today. It was even simpler to implement via Snowflake's Snowpipe solution.

We also plan to use the Machine Learning features built into BigQuery to accelerate our deployment of Data-Science-based projects. An opportunity only offered by the BigQuery solution

See more
Manage your open source components, licenses, and vulnerabilities
Learn More
Pros of Apache Drill
Pros of Google BigQuery
  • 4
    NoSQL and Hadoop
  • 3
    Free
  • 3
    Lightning speed and simplicity in face of data jungle
  • 2
    Well documented for fast install
  • 1
    SQL interface to multiple datasources
  • 1
    Nested Data support
  • 1
    Read Structured and unstructured data
  • 1
    V1.10 released - https://drill.apache.org/
  • 28
    High Performance
  • 25
    Easy to use
  • 22
    Fully managed service
  • 19
    Cheap Pricing
  • 16
    Process hundreds of GB in seconds
  • 12
    Big Data
  • 11
    Full table scans in seconds, no indexes needed
  • 8
    Always on, no per-hour costs
  • 6
    Good combination with fluentd
  • 4
    Machine learning
  • 1
    Easy to manage
  • 0
    Easy to learn

Sign up to add or upvote prosMake informed product decisions

Cons of Apache Drill
Cons of Google BigQuery
    Be the first to leave a con
    • 1
      You can't unit test changes in BQ data
    • 0
      Sdas

    Sign up to add or upvote consMake informed product decisions

    16
    4
    4.3K
    26K

    What is Apache Drill?

    Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google's Dremel.

    What is Google BigQuery?

    Run super-fast, SQL-like queries against terabytes of data in seconds, using the processing power of Google's infrastructure. Load data with ease. Bulk load your data using Google Cloud Storage or stream it in. Easy access. Access BigQuery by using a browser tool, a command-line tool, or by making calls to the BigQuery REST API with client libraries such as Java, PHP or Python.

    Need advice about which tool to choose?Ask the StackShare community!

    What companies use Apache Drill?
    What companies use Google BigQuery?
    Manage your open source components, licenses, and vulnerabilities
    Learn More

    Sign up to get full access to all the companiesMake informed product decisions

    What tools integrate with Apache Drill?
    What tools integrate with Google BigQuery?

    Sign up to get full access to all the tool integrationsMake informed product decisions

    Blog Posts

    Aug 28 2019 at 3:10AM

    Segment

    PythonJavaAmazon S3+16
    7
    2651
    Jul 2 2019 at 9:34PM

    Segment

    Google AnalyticsAmazon S3New Relic+25
    10
    6912
    GitHubPythonNode.js+47
    55
    72891
    What are some alternatives to Apache Drill and Google BigQuery?
    Presto
    Distributed SQL Query Engine for Big Data
    Apache Spark
    Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.
    Apache Calcite
    It is an open source framework for building databases and data management systems. It includes a SQL parser, an API for building expressions in relational algebra, and a query planning engine
    Apache Impala
    Impala is a modern, open source, MPP SQL query engine for Apache Hadoop. Impala is shipped by Cloudera, MapR, and Amazon. With Impala, you can query data, whether stored in HDFS or Apache HBase – including SELECT, JOIN, and aggregate functions – in real time.
    Druid
    Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments. Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations.
    See all alternatives