Process and move data between different AWS compute and storage services

What is AWS Data Pipeline?

AWS Data Pipeline is a web service that provides a simple management system for data-driven workflows. Using AWS Data Pipeline, you define a pipeline composed of the “data sources” that contain your data, the “activities” or business logic such as EMR jobs or SQL queries, and the “schedule” on which your business logic executes. For example, you could define a job that, every hour, runs an Amazon Elastic MapReduce (Amazon EMR)–based analysis on that hour’s Amazon Simple Storage Service (Amazon S3) log data, loads the results into a relational database for future lookup, and then automatically sends you a daily summary email.

AWS Data Pipeline is a tool in the Data Transfer category of a tech stack.

Who Uses AWS Data Pipeline?

16 companies use AWS Data Pipeline including Coursera, Rumble, and L2, Inc..

Why people like AWS Data Pipeline

Here’s a list of reasons why companies and developers use AWS Data Pipeline.

Add a one-liner

AWS Data Pipeline's Features

  • You can find (and use) a variety of popular AWS Data Pipeline tasks in the AWS Management Console’s template section.
  • Hourly analysis of Amazon S3‐based log data
  • Daily replication of AmazonDynamoDB data to Amazon S3
  • Periodic replication of on-premise JDBC database tables into RDS

AWS Data Pipeline's alternatives

  • AWS Glue - Fully managed extract, transform, and load (ETL) service
  • Airflow - A platform to programmaticaly author, schedule and monitor data pipelines, by Airbnb
  • AWS Step Functions - Build Distributed Applications Using Visual Workflows
  • AWS Import/Export - Transfer your data directly onto and off of storage devices using Amazon’s internal network and bypassing the Internet
  • Google BigQuery Data Transfer Service - Automate data movement from SaaS applications to Google BigQuery on a scheduled, managed basis

See all alternatives to AWS Data Pipeline