488
272
+ 1
92

What is Google BigQuery?

Run super-fast, SQL-like queries against terabytes of data in seconds, using the processing power of Google's infrastructure. Load data with ease. Bulk load your data using Google Cloud Storage or stream it in. Easy access. Access BigQuery by using a browser tool, a command-line tool, or by making calls to the BigQuery REST API with client libraries such as Java, PHP or Python.
Google BigQuery is a tool in the Big Data as a Service category of a tech stack.

Who uses Google BigQuery?

Companies
236 companies reportedly use Google BigQuery in their tech stacks, including Spotify, imgix, and Sentry.

Developers
227 developers on StackShare have stated that they use Google BigQuery.

Google BigQuery Integrations

Chartio, Fastly, Redash, Mode, and Vero are some of the popular tools that integrate with Google BigQuery. Here's a list of all 29 tools that integrate with Google BigQuery.

Why developers like Google BigQuery?

Here鈥檚 a list of reasons why companies and developers use Google BigQuery
Google BigQuery Reviews

Here are some stack decisions, common use cases and reviews by companies and developers who chose Google BigQuery in their tech stack.

Google Cloud IoT Core
Google Cloud IoT Core
Terraform
Terraform
Python
Python
Google Cloud Deployment Manager
Google Cloud Deployment Manager
Google Cloud Build
Google Cloud Build
Google Cloud Run
Google Cloud Run
Google Cloud Bigtable
Google Cloud Bigtable
Google BigQuery
Google BigQuery
Google Cloud Storage
Google Cloud Storage
Google Compute Engine
Google Compute Engine
GitHub
GitHub

Context: I wanted to create an end to end IoT data pipeline simulation in Google Cloud IoT Core and other GCP services. I never touched Terraform meaningfully until working on this project, and it's one of the best explorations in my development career. The documentation and syntax is incredibly human-readable and friendly. I'm used to building infrastructure through the google apis via Python , but I'm so glad past Sung did not make that decision. I was tempted to use Google Cloud Deployment Manager, but the templates were a bit convoluted by first impression. I'm glad past Sung did not make this decision either.

Solution: Leveraging Google Cloud Build Google Cloud Run Google Cloud Bigtable Google BigQuery Google Cloud Storage Google Compute Engine along with some other fun tools, I can deploy over 40 GCP resources using Terraform!

Check Out My Architecture: CLICK ME

Check out the GitHub repo attached

See more
Tim Specht
Tim Specht
鈥嶤o-Founder and CTO at Dubsmash | 14 upvotes 86.3K views
atDubsmashDubsmash
Google Analytics
Google Analytics
Amazon Kinesis
Amazon Kinesis
AWS Lambda
AWS Lambda
Amazon SQS
Amazon SQS
Google BigQuery
Google BigQuery
#ServerlessTaskProcessing
#GeneralAnalytics
#RealTimeDataProcessing
#BigDataAsAService

In order to accurately measure & track user behaviour on our platform we moved over quickly from the initial solution using Google Analytics to a custom-built one due to resource & pricing concerns we had.

While this does sound complicated, it鈥檚 as easy as clients sending JSON blobs of events to Amazon Kinesis from where we use AWS Lambda & Amazon SQS to batch and process incoming events and then ingest them into Google BigQuery. Once events are stored in BigQuery (which usually only takes a second from the time the client sends the data until it鈥檚 available), we can use almost-standard-SQL to simply query for data while Google makes sure that, even with terabytes of data being scanned, query times stay in the range of seconds rather than hours. Before ingesting their data into the pipeline, our mobile clients are aggregating events internally and, once a certain threshold is reached or the app is going to the background, sending the events as a JSON blob into the stream.

In the past we had workers running that continuously read from the stream and would validate and post-process the data and then enqueue them for other workers to write them to BigQuery. We went ahead and implemented the Lambda-based approach in such a way that Lambda functions would automatically be triggered for incoming records, pre-aggregate events, and write them back to SQS, from which we then read them, and persist the events to BigQuery. While this approach had a couple of bumps on the road, like re-triggering functions asynchronously to keep up with the stream and proper batch sizes, we finally managed to get it running in a reliable way and are very happy with this solution today.

#ServerlessTaskProcessing #GeneralAnalytics #RealTimeDataProcessing #BigDataAsAService

See more
Nick Rockwell
Nick Rockwell
CTO at NY Times | 5 upvotes 13.7K views
atThe New York TimesThe New York Times
Google BigQuery
Google BigQuery
Google Cloud Pub/Sub
Google Cloud Pub/Sub
Google Cloud Dataflow
Google Cloud Dataflow
Amazon DynamoDB
Amazon DynamoDB
#Analytics
#AnalyticsPipeline

We really drank the Google Kool-Aid on analytics. So, everything's going into Google BigQuery and almost everything is going straight into Google Cloud Pub/Sub and then doing some processing in Google Cloud Dataflow before ending up in BigQuery. We still do too much processing and augmentation on the front end before it goes into Pub/Sub. And that's using some kind of stuff we pulled together using Amazon DynamoDB and so on. And it's very brittle, actually. Actually, Dynamo throttling is one of our biggest headaches. So, I want all of that to go away and do all our augmentation in BigQuery after the data's been collected. And having it just go straight into Pub/Sub. So, we're working on that. And it'll happen, some time. #Analytics #AnalyticsPipeline

See more
Amazon Athena
Amazon Athena
Google BigQuery
Google BigQuery

I use Amazon Athena because similar to Google BigQuery , you can store and query data easily. Especially since you can define data schema in the Glue data catalog, there's a central way to define data models.

However, I would not recommend for batch jobs. I typically use this to check intermediary datasets in data engineering workloads. It's good for getting a look and feel of the data along its ETL journey.

See more
Google BigQuery
Google BigQuery
Snowflake
Snowflake

I use Google BigQuery because it makes is super easy to query and store data for analytics workloads. If you're using GCP, you're likely using BigQuery. However, running data viz tools directly connected to BigQuery will run pretty slow. They recently announced BI Engine which will hopefully compete well against big players like Snowflake when it comes to concurrency.

What's nice too is that it has SQL-based ML tools, and it has great GIS support!

See more
Google BigQuery
Google BigQuery
dbt
dbt

I used dbt over manually setting up python wrappers around SQL scripts because it makes managing transformations within Google BigQuery much easier. This saves future Sung dozens of hours maintaining plumbing code to run a couple SQL queries. Check out my tutorial in the link!

I haven't seen any other tool make it as easy to run dependent SQL DAGs directly in a data warehouse.

See more

Google BigQuery's Features

  • All behind the scenes- Your queries can execute asynchronously in the background, and can be polled for status.
  • Import data with ease- Bulk load your data using Google Cloud Storage or stream it in bursts of up to 1,000 rows per second.
  • Affordable big data- The first Terabyte of data processed each month is free.
  • The right interface- Separate interfaces for administration and developers will make sure that you have access to the tools you need.

Google BigQuery Alternatives & Comparisons

What are some alternatives to Google BigQuery?
Google Cloud Bigtable
Google Cloud Bigtable offers you a fast, fully managed, massively scalable NoSQL database service that's ideal for web, mobile, and Internet of Things applications requiring terabytes to petabytes of data. Unlike comparable market offerings, Cloud Bigtable doesn't require you to sacrifice speed, scale, or cost efficiency when your applications grow. Cloud Bigtable has been battle-tested at Google for more than 10 years鈥攊t's the database driving major applications such as Google Analytics and Gmail.
Amazon Redshift
It is optimized for data sets ranging from a few hundred gigabytes to a petabyte or more and costs less than $1,000 per terabyte per year, a tenth the cost of most traditional data warehousing solutions.
Hadoop
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
Snowflake
Snowflake eliminates the administration and management demands of traditional data warehouses and big data platforms. Snowflake is a true data warehouse as a service running on Amazon Web Services (AWS)鈥攏o infrastructure to manage and no knobs to turn.
Google Analytics
Google Analytics lets you measure your advertising ROI as well as track your Flash, video, and social networking sites and applications.
See all alternatives

Google BigQuery's Followers
272 developers follow Google BigQuery to keep up with related blogs and decisions.
Chandra Sekhar Chaganti
Jack Bunnage
Nic Miller
Yannick Tian
Ravi Jain
matanbordo
Nurullah 脰zdemir
Aerwaluo
Lawrence Fernandes
ktykogm