Reading data from on prem data lake to cloud storage in order to utilize cloud computing for resource heavy operations regarding NLP and ML (<10GB Total). Trying to decide if we need to utilize Google BigQuery here or if we can work directly form Google Cloud Storage with a DataProc cluster. Any thoughts here would be appreciated in regards to which would be a better approach. Thanks!

READ LESS
4 upvotes·28.1K views
Replies (4)
Recommends
on
Google BigQuery

BigQuery's cost is the same as cloud storage for the storage. The cost is during the query. If you have clean data and structure, store it directly in bigquery this will be way more easier. If you have messy data or if you need to enrich them dataproc is for you

READ MORE
2 upvotes·3.3K views

For less than 10 GB you can use bigquery here as this is not considered a big data load (requests can be processed on a single pc) But if you want to process more (like 1 TB) I advise to use something else as scan costs tend to be high on BQ

READ MORE
2 upvotes·4.9K views
View all (4)
Avatar of Ryan Freedman