Help developers discover the tools you use. Get visibility for your team's tech choices and contribute to the community's knowledge.
AWS Data Pipeline is a web service that provides a simple management system for data-driven workflows. Using AWS Data Pipeline, you define a pipeline composed of the “data sources” that contain your data, the “activities” or business logic such as EMR jobs or SQL queries, and the “schedule” on which your business logic executes. For example, you could define a job that, every hour, runs an Amazon Elastic MapReduce (Amazon EMR)–based analysis on that hour’s Amazon Simple Storage Service (Amazon S3) log data, loads the results into a relational database for future lookup, and then automatically sends you a daily summary email. | Amazon Kinesis can collect and process hundreds of gigabytes of data per second from hundreds of thousands of sources, allowing you to easily write applications that process information in real-time, from sources such as web site click-streams, marketing and financial information, manufacturing instrumentation and social media, and operational logs and metering data. |
You can find (and use) a variety of popular AWS Data Pipeline tasks in the AWS Management Console’s template section.;Hourly analysis of Amazon S3‐based log data;Daily replication of AmazonDynamoDB data to Amazon S3;Periodic replication of on-premise JDBC database tables into RDS | Real-time Processing- Amazon Kinesis enables you to collect and analyze information in real-time, allowing you to answer questions about the current state of your data, from inventory levels to stock trade frequencies, rather than having to wait for an out-of-date report;Easy to use- You can create a new stream, set the throughput requirements, and start streaming data quickly and easily. Amazon Kinesis automatically provisions and manages the storage required to reliably and durably collect your data stream;High throughput. Elastic.- Amazon Kinesis seamlessly scales to match the data throughput rate and volume of your data, from megabytes to terabytes per hour. Amazon Kinesis will scale up or down based on your needs;Integrate with Amazon S3, Amazon Redshift, and Amazon DynamoDB- With Amazon Kinesis, you can reliably collect, process, and transform all of your data in real-time before delivering it to data stores of your choice, where it can be used by existing or new applications. Connectors enable integration with Amazon S3, Amazon Redshift, and Amazon DynamoDB;Build Kinesis Applications- Amazon Kinesis provides developers with client libraries that enable the design and operation of real-time data processing applications. Just add the Amazon Kinesis Client Library to your Java application and it will be notified when new data is available for processing;Low Cost- Amazon Kinesis is cost-efficient for workloads of any scale. You can pay as you go, and you’ll only pay for the resources you use. You can get started by provisioning low throughput streams, and only pay a low hourly rate for the throughput you need |
Statistics | |
Stacks 94 | Stacks 794 |
Followers 398 | Followers 604 |
Votes 1 | Votes 9 |
Pros & Cons | |
Pros
| Pros
Cons
|

Google Cloud Dataflow is a unified programming model and a managed service for developing and executing a wide range of data processing patterns including ETL, batch computation, and continuous computation. Cloud Dataflow frees you from operational tasks like resource management and performance optimization.

AWS Snowball Edge is a 100TB data transfer device with on-board storage and compute capabilities. You can use Snowball Edge to move large amounts of data into and out of AWS, as a temporary storage tier for large local datasets, or to support local workloads in remote or offline locations.

It is an elegant and simple HTTP library for Python, built for human beings. It allows you to send HTTP/1.1 requests extremely easily. There’s no need to manually add query strings to your URLs, or to form-encode your POST data.

Amazon Kinesis Firehose is the easiest way to load streaming data into AWS. It can capture and automatically load streaming data into Amazon S3 and Amazon Redshift, enabling near real-time analytics with existing business intelligence tools and dashboards you’re already using today.

It is a .NET library that can read/write Office formats without Microsoft Office installed. No COM+, no interop.

It's focus is on performance; specifically, end-user perceived latency, network and server resource usage.

It is an open-source bulk data loader that helps data transfer between various databases, storages, file formats, and cloud services.

BigQuery Data Transfer Service lets you focus your efforts on analyzing your data. You can setup a data transfer with a few clicks. Your analytics team can lay the foundation for a data warehouse without writing a single line of code.

A cloud-based solution engineered to fill the gaps between cloud applications. The software utilizes Intelligent 2-way Contact Sync technology to sync contacts in real-time between your favorite CRM and marketing apps.

It offers the industry leading data synchronization tool. Trusted by millions of users and thousands of companies across the globe. Resilient, fast and scalable p2p file sync software for enterprises and individuals.