Amazon S3 vs Google BigQuery vs SummitDB

Overview

Amazon S3

Stacks55.1K

Followers40.2K

Votes2.0K

Google BigQuery

Stacks1.8K

Followers1.5K

Votes152

SummitDB

Stacks4

Followers13

Votes0

GitHub Stars1.4K

Forks77

Amazon S3 vs Google BigQuery vs SummitDB: What are the differences?

Key Differences between Amazon S3, Google BigQuery, and SummitDB

Scalability: Amazon S3 is a highly scalable object storage service that can scale to handle trillions of objects. It provides automatic scaling and simple storage management. On the other hand, Google BigQuery is a fully managed, highly scalable, and serverless data warehouse. It handles the scalability aspect by dividing large datasets into smaller, manageable parts and distributing them across multiple machines. SummitDB, on the other hand, is a lightweight, embedded database that is not designed for the same level of scalability as S3 or BigQuery. It is better suited for smaller-scale applications.
Querying and Analytics: Amazon S3 is primarily used as an object storage service and does not offer built-in querying and analytics capabilities. It is mainly used for storing and retrieving data objects. On the other hand, Google BigQuery is a powerful analytics platform that provides SQL-like querying capabilities. It can handle complex analytical queries on large datasets efficiently. SummitDB, being a lightweight embedded database, also provides querying capabilities but is not as feature-rich as BigQuery.
Pricing Model: Amazon S3 follows a pay-as-you-go pricing model, where you pay for the storage space and the data transfer. There are different pricing tiers based on the storage class and usage patterns. Google BigQuery, on the other hand, has a separate pricing model based on the amount of data queried and the storage space used. It charges for data storage, streaming inserts, and querying. SummitDB, being an open-source database, is free to use and has no associated costs.
Data Replication: Amazon S3 replicates objects across multiple geographically distinct data centers to ensure high durability and availability. It provides strong data consistency for both read and write operations. Google BigQuery also replicates data across multiple locations, ensuring data redundancy and availability. SummitDB, being embedded within an application, does not have built-in data replication capabilities. Data replication needs to be handled by the application itself if required.
Data Types and Query Language: Amazon S3 does not enforce any specific data schema and can store any type of data as objects. It does not support complex data types and does not have a query language. On the other hand, Google BigQuery supports structured, semi-structured, and nested data types. It has a powerful query language called BigQuery SQL that allows complex data transformations and aggregations. SummitDB supports a limited set of data types and has its own query language similar to SQL.
Real-Time Processing: Amazon S3 is a storage service and does not provide real-time processing capabilities. It focuses on data storage and retrieval. Google BigQuery, on the other hand, supports real-time data streaming and can process streaming data in near real-time with the use of services like Cloud Pub/Sub. SummitDB does not have built-in real-time processing capabilities and does not natively support streaming data.

In Summary, Amazon S3 is a scalable object storage service without built-in querying capabilities, while Google BigQuery is a highly scalable data warehouse with powerful querying and analytics features. SummitDB, on the other hand, is a lightweight embedded database that is not designed for the same level of scalability or querying capabilities as S3 or BigQuery.

Share your Stack

Help developers discover the tools you use. Get visibility for your team's tech choices and contribute to the community's knowledge.

View Docs

CLI (Node.js)

Manual

Advice on Amazon S3, Google BigQuery, SummitDB

Mohammad

Aug 30, 2020

Needs adviceon

Backblaze B2 Cloud Storage

PHP

Laravel

Hello! I have a mobile app with nearly 100k MAU, and I want to add a cloud file storage service to my app.

My app will allow users to store their image, video, and audio files and retrieve them to their device when necessary.

I have already decided to use PHP & Laravel as my backend, and I use Contabo VPS. Now, I need an object storage service for my app, and my options are:

Amazon S3 : It sounds to me like the best option but the most expensive. Closest to my users (MENA Region) for other services, I will have to go to Europe. Not sure how important this is?
DigitalOcean Spaces : Seems like my best option for price/service, but I am still not sure
Wasabi: the best price (6 USD/MONTH/TB) and free bandwidth, but I am not sure if it fits my needs as I want to allow my users to preview audio and video files. They don't recommend their service for streaming videos.
Backblaze B2 Cloud Storage: Good price but not sure about them.
There is also the self-hosted s3 compatible option, but I am not sure about that.

Any thoughts will be helpful. Also, if you think I should post in a different sub, please tell me.

180k views180k

Comments

Julien

CTO at Hawk

Sep 19, 2020

Decided

Cloud Data-warehouse is the centerpiece of modern Data platform. The choice of the most suitable solution is therefore fundamental.

Our benchmark was conducted over BigQuery and Snowflake. These solutions seem to match our goals but they have very different approaches.

BigQuery is notably the only 100% serverless cloud data-warehouse, which requires absolutely NO maintenance: no re-clustering, no compression, no index optimization, no storage management, no performance management. Snowflake requires to set up (paid) reclustering processes, to manage the performance allocated to each profile, etc. We can also mention Redshift, which we have eliminated because this technology requires even more ops operation.

BigQuery can therefore be set up with almost zero cost of human resources. Its on-demand pricing is particularly adapted to small workloads. 0 cost when the solution is not used, only pay for the query you're running. But quickly the use of slots (with monthly or per-minute commitment) will drastically reduce the cost of use. We've reduced by 10 the cost of our nightly batches by using flex slots.

Finally, a major advantage of BigQuery is its almost perfect integration with Google Cloud Platform services: Cloud functions, Dataflow, Data Studio, etc.

BigQuery is still evolving very quickly. The next milestone, BigQuery Omni, will allow to run queries over data stored in an external Cloud platform (Amazon S3 for example). It will be a major breakthrough in the history of cloud data-warehouses. Omni will compensate a weakness of BigQuery: transferring data in near real time from S3 to BQ is not easy today. It was even simpler to implement via Snowflake's Snowpipe solution.

We also plan to use the Machine Learning features built into BigQuery to accelerate our deployment of Data-Science-based projects. An opportunity only offered by the BigQuery solution

193k views193k

Comments

Gabriel

CEO at NaoLogic Inc

Dec 24, 2019

Decided

We offer our customer HIPAA compliant storage. After analyzing the market, we decided to go with Google Storage. The Nodejs API is ok, still not ES6 and can be very confusing to use. For each new customer, we created a different bucket so they can have individual data and not have to worry about data loss. After 1000+ customers we started seeing many problems with the creation of new buckets, with saving or retrieving a new file. Many false positive: the Promise returned ok, but in reality, it failed.

That's why we switched to S3 that just works.

330k views330k

Comments

Detailed Comparison

Amazon S3	Google BigQuery	SummitDB
Amazon Simple Storage Service provides a fully redundant data storage infrastructure for storing and retrieving any amount of data, at any time, from anywhere on the web	Run super-fast, SQL-like queries against terabytes of data in seconds, using the processing power of Google's infrastructure. Load data with ease. Bulk load your data using Google Cloud Storage or stream it in. Easy access. Access BigQuery by using a browser tool, a command-line tool, or by making calls to the BigQuery REST API with client libraries such as Java, PHP or Python.	SummitDB is an in-memory, NoSQL key/value database. It persists to disk, uses the Raft consensus algorithm, is ACID compliant, and built on a transactional and strongly-consistent model. It supports custom indexes, geospatial data, JSON documents, and user-defined JS scripting.
Write, read, and delete objects containing from 1 byte to 5 terabytes of data each. The number of objects you can store is unlimited.;Each object is stored in a bucket and retrieved via a unique, developer-assigned key.;A bucket can be stored in one of several Regions. You can choose a Region to optimize for latency, minimize costs, or address regulatory requirements. Amazon S3 is currently available in the US Standard, US West (Oregon), US West (Northern California), EU (Ireland), Asia Pacific (Singapore), Asia Pacific (Tokyo), Asia Pacific (Sydney), South America (Sao Paulo), and GovCloud (US) Regions. The US Standard Region automatically routes requests to facilities in Northern Virginia or the Pacific Northwest using network maps.;Objects stored in a Region never leave the Region unless you transfer them out. For example, objects stored in the EU (Ireland) Region never leave the EU.;Authentication mechanisms are provided to ensure that data is kept secure from unauthorized access. Objects can be made private or public, and rights can be granted to specific users.;Options for secure data upload/download and encryption of data at rest are provided for additional data protection.;Uses standards-based REST and SOAP interfaces designed to work with any Internet-development toolkit.;Built to be flexible so that protocol or functional layers can easily be added. The default download protocol is HTTP. A BitTorrent protocol interface is provided to lower costs for high-scale distribution.;Provides functionality to simplify manageability of data through its lifetime. Includes options for segregating data by buckets, monitoring and controlling spend, and automatically archiving data to even lower cost storage options. These options can be easily administered from the Amazon S3 Management Console.;Reliability backed with the Amazon S3 Service Level Agreement.	All behind the scenes- Your queries can execute asynchronously in the background, and can be polled for status.;Import data with ease- Bulk load your data using Google Cloud Storage or stream it in bursts of up to 1,000 rows per second.;Affordable big data- The first Terabyte of data processed each month is free.;The right interface- Separate interfaces for administration and developers will make sure that you have access to the tools you need.	-
Statistics
GitHub Stars -	GitHub Stars -	GitHub Stars 1.4K
GitHub Forks -	GitHub Forks -	GitHub Forks 77
Stacks 55.1K	Stacks 1.8K	Stacks 4
Followers 40.2K	Followers 1.5K	Followers 13
Votes 2.0K	Votes 152	Votes 0
Pros & Cons
Pros 590 Reliable 492 Scalable 456 Cheap 329 Simple & easy 83 Many sdks Cons 7 Permissions take some time to get right 6 Requires a credit card 6 Takes time/work to organize buckets & folders properly 3 Complex to set up	Pros 28 High Performance 25 Easy to use 22 Fully managed service 19 Cheap Pricing 16 Process hundreds of GB in seconds Cons 1 You can't unit test changes in BQ data 0 Sdas	No community feedback yet
Integrations
No integrations available	Xplenty Fluentd Looker Chartio Treasure Data	No integrations available

What are some alternatives to Amazon S3, Google BigQuery, SummitDB?

Redis

Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache, and message broker. Redis provides data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes, and streams.

Amazon Redshift

It is optimized for data sets ranging from a few hundred gigabytes to a petabyte or more and costs less than $1,000 per terabyte per year, a tenth the cost of most traditional data warehousing solutions.

Amazon EBS

Amazon EBS volumes are network-attached, and persist independently from the life of an instance. Amazon EBS provides highly available, highly reliable, predictable storage volumes that can be attached to a running Amazon EC2 instance and exposed as a device within the instance. Amazon EBS is particularly suited for applications that require a database, file system, or access to raw block level storage.

Google Cloud Storage

Google Cloud Storage allows world-wide storing and retrieval of any amount of data and at any time. It provides a simple programming interface which enables developers to take advantage of Google's own reliable and fast networking infrastructure to perform data operations in a secure and cost effective manner. If expansion needs arise, developers can benefit from the scalability provided by Google's infrastructure.

Qubole

Qubole is a cloud based service that makes big data easy for analysts and data engineers.

Hazelcast

With its various distributed data structures, distributed caching capabilities, elastic nature, memcache support, integration with Spring and Hibernate and more importantly with so many happy users, Hazelcast is feature-rich, enterprise-ready and developer-friendly in-memory data grid solution.

Amazon EMR

It is used in a variety of applications, including log analysis, data warehousing, machine learning, financial analysis, scientific simulation, and bioinformatics.

Azure Storage

Azure Storage provides the flexibility to store and retrieve large amounts of unstructured data, such as documents and media files with Azure Blobs; structured nosql based data with Azure Tables; reliable messages with Azure Queues, and use SMB based Azure Files for migrating on-premises applications to the cloud.

Aerospike

Aerospike is an open-source, modern database built from the ground up to push the limits of flash storage, processors and networks. It was designed to operate with predictable low latency at high throughput with uncompromising reliability – both high availability and ACID guarantees.

MemSQL

MemSQL converges transactions and analytics for sub-second data processing and reporting. Real-time businesses can build robust applications on a simple and scalable infrastructure that complements and extends existing data pipelines.

Related Comparisons

Bootstrap vs Materialize

Django vs Laravel vs Node.js

Bootstrap vs Foundation vs Material UI

Node.js vs Spring-Boot

Flyway vs Liquibase

Overview

Amazon S3

Stacks55.1K

Followers40.2K

Votes2.0K

Google BigQuery

Stacks1.8K

Followers1.5K

Votes152

SummitDB

Stacks4

Followers13

Votes0

GitHub Stars1.4K

Forks77

Amazon S3 vs Google BigQuery vs SummitDB: What are the differences?

Key Differences between Amazon S3, Google BigQuery, and SummitDB

Scalability: Amazon S3 is a highly scalable object storage service that can scale to handle trillions of objects. It provides automatic scaling and simple storage management. On the other hand, Google BigQuery is a fully managed, highly scalable, and serverless data warehouse. It handles the scalability aspect by dividing large datasets into smaller, manageable parts and distributing them across multiple machines. SummitDB, on the other hand, is a lightweight, embedded database that is not designed for the same level of scalability as S3 or BigQuery. It is better suited for smaller-scale applications.
Querying and Analytics: Amazon S3 is primarily used as an object storage service and does not offer built-in querying and analytics capabilities. It is mainly used for storing and retrieving data objects. On the other hand, Google BigQuery is a powerful analytics platform that provides SQL-like querying capabilities. It can handle complex analytical queries on large datasets efficiently. SummitDB, being a lightweight embedded database, also provides querying capabilities but is not as feature-rich as BigQuery.
Pricing Model: Amazon S3 follows a pay-as-you-go pricing model, where you pay for the storage space and the data transfer. There are different pricing tiers based on the storage class and usage patterns. Google BigQuery, on the other hand, has a separate pricing model based on the amount of data queried and the storage space used. It charges for data storage, streaming inserts, and querying. SummitDB, being an open-source database, is free to use and has no associated costs.
Data Replication: Amazon S3 replicates objects across multiple geographically distinct data centers to ensure high durability and availability. It provides strong data consistency for both read and write operations. Google BigQuery also replicates data across multiple locations, ensuring data redundancy and availability. SummitDB, being embedded within an application, does not have built-in data replication capabilities. Data replication needs to be handled by the application itself if required.
Data Types and Query Language: Amazon S3 does not enforce any specific data schema and can store any type of data as objects. It does not support complex data types and does not have a query language. On the other hand, Google BigQuery supports structured, semi-structured, and nested data types. It has a powerful query language called BigQuery SQL that allows complex data transformations and aggregations. SummitDB supports a limited set of data types and has its own query language similar to SQL.
Real-Time Processing: Amazon S3 is a storage service and does not provide real-time processing capabilities. It focuses on data storage and retrieval. Google BigQuery, on the other hand, supports real-time data streaming and can process streaming data in near real-time with the use of services like Cloud Pub/Sub. SummitDB does not have built-in real-time processing capabilities and does not natively support streaming data.

Advice on Amazon S3, Google BigQuery, SummitDB

Mohammad

Aug 30, 2020

Needs adviceon

Backblaze B2 Cloud Storage

PHP

Laravel

Hello! I have a mobile app with nearly 100k MAU, and I want to add a cloud file storage service to my app.

My app will allow users to store their image, video, and audio files and retrieve them to their device when necessary.

I have already decided to use PHP & Laravel as my backend, and I use Contabo VPS. Now, I need an object storage service for my app, and my options are:

Amazon S3 : It sounds to me like the best option but the most expensive. Closest to my users (MENA Region) for other services, I will have to go to Europe. Not sure how important this is?
DigitalOcean Spaces : Seems like my best option for price/service, but I am still not sure
Wasabi: the best price (6 USD/MONTH/TB) and free bandwidth, but I am not sure if it fits my needs as I want to allow my users to preview audio and video files. They don't recommend their service for streaming videos.
Backblaze B2 Cloud Storage: Good price but not sure about them.
There is also the self-hosted s3 compatible option, but I am not sure about that.

Any thoughts will be helpful. Also, if you think I should post in a different sub, please tell me.

180k views180k

Comments

Julien

CTO at Hawk

Sep 19, 2020

Decided

Cloud Data-warehouse is the centerpiece of modern Data platform. The choice of the most suitable solution is therefore fundamental.

Our benchmark was conducted over BigQuery and Snowflake. These solutions seem to match our goals but they have very different approaches.

Finally, a major advantage of BigQuery is its almost perfect integration with Google Cloud Platform services: Cloud functions, Dataflow, Data Studio, etc.

We also plan to use the Machine Learning features built into BigQuery to accelerate our deployment of Data-Science-based projects. An opportunity only offered by the BigQuery solution

193k views193k

Comments

Gabriel

CEO at NaoLogic Inc

Dec 24, 2019

Decided

That's why we switched to S3 that just works.

330k views330k

Comments

Detailed Comparison

Amazon S3	Google BigQuery	SummitDB
Amazon Simple Storage Service provides a fully redundant data storage infrastructure for storing and retrieving any amount of data, at any time, from anywhere on the web	Run super-fast, SQL-like queries against terabytes of data in seconds, using the processing power of Google's infrastructure. Load data with ease. Bulk load your data using Google Cloud Storage or stream it in. Easy access. Access BigQuery by using a browser tool, a command-line tool, or by making calls to the BigQuery REST API with client libraries such as Java, PHP or Python.	SummitDB is an in-memory, NoSQL key/value database. It persists to disk, uses the Raft consensus algorithm, is ACID compliant, and built on a transactional and strongly-consistent model. It supports custom indexes, geospatial data, JSON documents, and user-defined JS scripting.
Write, read, and delete objects containing from 1 byte to 5 terabytes of data each. The number of objects you can store is unlimited.;Each object is stored in a bucket and retrieved via a unique, developer-assigned key.;A bucket can be stored in one of several Regions. You can choose a Region to optimize for latency, minimize costs, or address regulatory requirements. Amazon S3 is currently available in the US Standard, US West (Oregon), US West (Northern California), EU (Ireland), Asia Pacific (Singapore), Asia Pacific (Tokyo), Asia Pacific (Sydney), South America (Sao Paulo), and GovCloud (US) Regions. The US Standard Region automatically routes requests to facilities in Northern Virginia or the Pacific Northwest using network maps.;Objects stored in a Region never leave the Region unless you transfer them out. For example, objects stored in the EU (Ireland) Region never leave the EU.;Authentication mechanisms are provided to ensure that data is kept secure from unauthorized access. Objects can be made private or public, and rights can be granted to specific users.;Options for secure data upload/download and encryption of data at rest are provided for additional data protection.;Uses standards-based REST and SOAP interfaces designed to work with any Internet-development toolkit.;Built to be flexible so that protocol or functional layers can easily be added. The default download protocol is HTTP. A BitTorrent protocol interface is provided to lower costs for high-scale distribution.;Provides functionality to simplify manageability of data through its lifetime. Includes options for segregating data by buckets, monitoring and controlling spend, and automatically archiving data to even lower cost storage options. These options can be easily administered from the Amazon S3 Management Console.;Reliability backed with the Amazon S3 Service Level Agreement.	All behind the scenes- Your queries can execute asynchronously in the background, and can be polled for status.;Import data with ease- Bulk load your data using Google Cloud Storage or stream it in bursts of up to 1,000 rows per second.;Affordable big data- The first Terabyte of data processed each month is free.;The right interface- Separate interfaces for administration and developers will make sure that you have access to the tools you need.	-
Statistics
GitHub Stars -	GitHub Stars -	GitHub Stars 1.4K
GitHub Forks -	GitHub Forks -	GitHub Forks 77
Stacks 55.1K	Stacks 1.8K	Stacks 4
Followers 40.2K	Followers 1.5K	Followers 13
Votes 2.0K	Votes 152	Votes 0
Pros & Cons
Pros 590 Reliable 492 Scalable 456 Cheap 329 Simple & easy 83 Many sdks Cons 7 Permissions take some time to get right 6 Requires a credit card 6 Takes time/work to organize buckets & folders properly 3 Complex to set up	Pros 28 High Performance 25 Easy to use 22 Fully managed service 19 Cheap Pricing 16 Process hundreds of GB in seconds Cons 1 You can't unit test changes in BQ data 0 Sdas	No community feedback yet
Integrations
No integrations available	Xplenty Fluentd Looker Chartio Treasure Data	No integrations available