What is Google BigQuery?

Run super-fast, SQL-like queries against terabytes of data in seconds, using the processing power of Google's infrastructure. Load data with ease. Bulk load your data using Google Cloud Storage or stream it in. Easy access. Access BigQuery by using a browser tool, a command-line tool, or by making calls to the BigQuery REST API with client libraries such as Java, PHP or Python.
Google BigQuery is a tool in the Big Data as a Service category of a tech stack.

Who uses Google BigQuery?

Companies
155 companies use Google BigQuery in their tech stacks, including Sentry, Spotify, and Webedia.

Developers
39 developers use Google BigQuery.

Google BigQuery Integrations

Fastly, Fluentd, Looker, Redash, and Stitch are some of the popular tools that integrate with Google BigQuery. Here's a list of all 26 tools that integrate with Google BigQuery.

Why developers like Google BigQuery?

Here’s a list of reasons why companies and developers use Google BigQuery
Google BigQuery Reviews

Here are some stack decisions, common use cases and reviews by companies and developers who chose Google BigQuery in their tech stack.

Tim Specht
Tim Specht
‎Co-Founder and CTO at Dubsmash · | 14 upvotes · 6.8K views
atDubsmash
Google BigQuery
Amazon SQS
AWS Lambda
Amazon Kinesis
Google Analytics
#BigDataAsAService
#RealTimeDataProcessing
#GeneralAnalytics
#ServerlessTaskProcessing

In order to accurately measure & track user behaviour on our platform we moved over quickly from the initial solution using Google Analytics to a custom-built one due to resource & pricing concerns we had.

While this does sound complicated, it’s as easy as clients sending JSON blobs of events to Amazon Kinesis from where we use AWS Lambda & Amazon SQS to batch and process incoming events and then ingest them into Google BigQuery. Once events are stored in BigQuery (which usually only takes a second from the time the client sends the data until it’s available), we can use almost-standard-SQL to simply query for data while Google makes sure that, even with terabytes of data being scanned, query times stay in the range of seconds rather than hours. Before ingesting their data into the pipeline, our mobile clients are aggregating events internally and, once a certain threshold is reached or the app is going to the background, sending the events as a JSON blob into the stream.

In the past we had workers running that continuously read from the stream and would validate and post-process the data and then enqueue them for other workers to write them to BigQuery. We went ahead and implemented the Lambda-based approach in such a way that Lambda functions would automatically be triggered for incoming records, pre-aggregate events, and write them back to SQS, from which we then read them, and persist the events to BigQuery. While this approach had a couple of bumps on the road, like re-triggering functions asynchronously to keep up with the stream and proper batch sizes, we finally managed to get it running in a reliable way and are very happy with this solution today.

#ServerlessTaskProcessing #GeneralAnalytics #RealTimeDataProcessing #BigDataAsAService

See more
Nick Rockwell
Nick Rockwell
CTO at NY Times · | 5 upvotes · 2.5K views
atThe New York Times
Amazon DynamoDB
Google Cloud Dataflow
Google Cloud Pub/Sub
Google BigQuery
#AnalyticsPipeline
#Analytics

We really drank the Google Kool-Aid on analytics. So, everything's going into Google BigQuery and almost everything is going straight into Google Cloud Pub/Sub and then doing some processing in Google Cloud Dataflow before ending up in BigQuery. We still do too much processing and augmentation on the front end before it goes into Pub/Sub. And that's using some kind of stuff we pulled together using Amazon DynamoDB and so on. And it's very brittle, actually. Actually, Dynamo throttling is one of our biggest headaches. So, I want all of that to go away and do all our augmentation in BigQuery after the data's been collected. And having it just go straight into Pub/Sub. So, we're working on that. And it'll happen, some time. #Analytics #AnalyticsPipeline

See more
Snowflake
Google BigQuery

I use Google BigQuery because it makes is super easy to query and store data for analytics workloads. If you're using GCP, you're likely using BigQuery. However, running data viz tools directly connected to BigQuery will run pretty slow. They recently announced BI Engine which will hopefully compete well against big players like Snowflake when it comes to concurrency.

What's nice too is that it has SQL-based ML tools, and it has great GIS support!

See more
Google BigQuery
Amazon Athena

I use Amazon Athena because similar to Google BigQuery , you can store and query data easily. Especially since you can define data schema in the Glue data catalog, there's a central way to define data models.

However, I would not recommend for batch jobs. I typically use this to check intermediary datasets in data engineering workloads. It's good for getting a look and feel of the data along its ETL journey.

See more
Meredith Fuhrman
Meredith Fuhrman
Software Engineer (Support Tools) at ShareThis · | 1 upvotes · 980 views
atShareThis
Google BigQuery

BigQuery allows our team to pull reports quickly using a SQL-like queries against our large store of data about social sharing. We use the information throughout the company, to do everything from making internal product decisions based on usage patterns to sharing certain kinds of custom reports with our publishers. Google BigQuery

See more
Google BigQuery

Aggregation of user events and traits across a marketing website, SaaS web application, user account provisioning backend and Salesforce CRM. Enables full-funnel analysis of campaign ROI, customer acquisition, engagement and retention at both the user and target account level. Google BigQuery

See more

Google BigQuery's features

  • All behind the scenes- Your queries can execute asynchronously in the background, and can be polled for status.
  • Import data with ease- Bulk load your data using Google Cloud Storage or stream it in bursts of up to 1,000 rows per second.
  • Affordable big data- The first Terabyte of data processed each month is free.
  • The right interface- Separate interfaces for administration and developers will make sure that you have access to the tools you need.

Google BigQuery Alternatives & Comparisons

What are some alternatives to Google BigQuery?
Google Cloud Bigtable
Google Cloud Bigtable offers you a fast, fully managed, massively scalable NoSQL database service that's ideal for web, mobile, and Internet of Things applications requiring terabytes to petabytes of data. Unlike comparable market offerings, Cloud Bigtable doesn't require you to sacrifice speed, scale, or cost efficiency when your applications grow. Cloud Bigtable has been battle-tested at Google for more than 10 years—it's the database driving major applications such as Google Analytics and Gmail.
Amazon Redshift
Redshift makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. It is optimized for datasets ranging from a few hundred gigabytes to a petabyte or more and costs less than $1,000 per terabyte per year, a tenth the cost of most traditional data warehousing solutions.
Hadoop
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
Snowflake
Snowflake eliminates the administration and management demands of traditional data warehouses and big data platforms. Snowflake is a true data warehouse as a service running on Amazon Web Services (AWS)—no infrastructure to manage and no knobs to turn.
Amazon Athena
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
See all alternatives

Google BigQuery's Stats

- No public GitHub repository available -