Google Cloud Dataflow

We really drank the Google Kool-Aid on analytics. So, everything's going into Google BigQuery and almost everything is going straight into Google Cloud Pub/Sub and then doing some processing in Google Cloud Dataflow before ending up in BigQuery. We still do too much processing and augmentation on the front end before it goes into Pub/Sub. And that's using some kind of stuff we pulled together using Amazon DynamoDB and so on. And it's very brittle, actually. Actually, Dynamo throttling is one of our biggest headaches. So, I want all of that to go away and do all our augmentation in BigQuery after the data's been collected. And having it just go straight into Pub/Sub. So, we're working on that. And it'll happen, some time. #Analytics #AnalyticsPipeline

Google Cloud Dataflow Discussions

Discover why developers choose Google Cloud Dataflow. Read real-world technical decisions and stack choices from the StackShare community.

Andrea Latorre

Jan 2, 2023

Needs adviceon

Google Cloud Data Fusion

Google BigQuery

Google Cloud Dataflow

I am currently launching 50 pipelines in a Google Cloud Data Fusion version 6.4 instance. These pipelines are launched daily and transport data from a MySQLServer database to Google BigQuery. The cost is becoming very high and I was wondering if the costs with Google Cloud Dataflow decrease for the same rows transported.

0 views0

Comments

Vishal Yadav

Dec 26, 2022

Needs adviceon

AWS Glue

Google Cloud Dataflow

Google Cloud Data Fusion

Will Dataflow be the right replacement for AWS Glue? Are there any unforeseen exceptions like certain proprietary transformations not supported in Google Cloud Dataflow, connectors ecosystem, Data Quality & Date cleansing not supported in DataFlow. etc?

Also, how about Google Cloud Data Fusion as a replacement? In terms of No Code/Low code .. (Since basic use cases in Glue support UI, in that case, CDF may be the right choice ).

What would be the best choice?

0 views0

Comments

Sung Won Chung

Jun 5, 2019

Needs adviceon

Google Cloud Dataflow

Java

I use Google Cloud Dataflow because it has great templates for plug and play action.

I haven't invested in the apache beam framework because you need to know Java to take full advantage of it. The Python API is a second class citizen.

0 views0

Comments