Need advice about which tool to choose?Ask the StackShare community!

Pig

59
109
+ 1
5
Talend

169
228
+ 1
0
Add tool

Pig vs Talend: What are the differences?

Developers describe Pig as "Platform for analyzing large data sets". Pig is a dataflow programming environment for processing very large files. Pig's language is called Pig Latin. A Pig Latin program consists of a directed acyclic graph where each node represents an operation that transforms data Operations are of two flavors: (1) relational-algebra style operations such as join, filter, project; (2) functional-programming style operators such as map, reduce. . On the other hand, Talend is detailed as "A single, unified suite for all integration needs". It is an open source software integration platform helps you in effortlessly turning data into business insights. It uses native code generation that lets you run your data pipelines seamlessly across all cloud providers and get optimized performance on all platforms.

Pig and Talend can be primarily classified as "Big Data" tools.

Pig is an open source tool with 583 GitHub stars and 449 GitHub forks. Here's a link to Pig's open source repository on GitHub.

Advice on Pig and Talend
karunakaran karthikeyan
Needs advice
on
DremioDremio
and
TalendTalend

I am trying to build a data lake by pulling data from multiple data sources ( custom-built tools, excel files, CSV files, etc) and use the data lake to generate dashboards.

My question is which is the best tool to do the following:

  1. Create pipelines to ingest the data from multiple sources into the data lake
  2. Help me in aggregating and filtering data available in the data lake.
  3. Create new reports by combining different data elements from the data lake.

I need to use only open-source tools for this activity.

I appreciate your valuable inputs and suggestions. Thanks in Advance.

See more
Replies (1)
Rod Beecham
Partnering Lead at Zetaris · | 3 upvotes · 50.9K views
Recommends
DremioDremio

Hi Karunakaran. I obviously have an interest here, as I work for the company, but the problem you are describing is one that Zetaris can solve. Talend is a good ETL product, and Dremio is a good data virtualization product, but the problem you are describing best fits a tool that can combine the five styles of data integration (bulk/batch data movement, data replication/data synchronization, message-oriented movement of data, data virtualization, and stream data integration). I may be wrong, but Zetaris is, to the best of my knowledge, the only product in the world that can do this. Zetaris is not a dashboarding tool - you would need to combine us with Tableau or Qlik or PowerBI (or whatever) - but Zetaris can consolidate data from any source and any location (structured, unstructured, on-prem or in the cloud) in real time to allow clients a consolidated view of whatever they want whenever they want it. Please take a look at www.zetaris.com for more information. I don't want to do a "hard sell", here, so I'll say no more! Warmest regards, Rod Beecham.

See more
Get Advice from developers at your company using StackShare Enterprise. Sign up for StackShare Enterprise.
Learn More
Pros of Pig
Pros of Talend
  • 2
    Finer-grained control on parallelization
  • 1
    Proven at Petabyte scale
  • 1
    Open-source
  • 1
    Join optimizations for highly skewed data
    Be the first to leave a pro

    Sign up to add or upvote prosMake informed product decisions

    - No public GitHub repository available -

    What is Pig?

    Pig is a dataflow programming environment for processing very large files. Pig's language is called Pig Latin. A Pig Latin program consists of a directed acyclic graph where each node represents an operation that transforms data. Operations are of two flavors: (1) relational-algebra style operations such as join, filter, project; (2) functional-programming style operators such as map, reduce.

    What is Talend?

    It is an open source software integration platform helps you in effortlessly turning data into business insights. It uses native code generation that lets you run your data pipelines seamlessly across all cloud providers and get optimized performance on all platforms.

    Need advice about which tool to choose?Ask the StackShare community!

    Jobs that mention Pig and Talend as a desired skillset
    CBRE
    United States of America Texas Richardson
    CBRE
    United Kingdom of Great Britain and Northern Ireland England Feltham
    What companies use Pig?
    What companies use Talend?
    See which teams inside your own company are using Pig or Talend.
    Sign up for StackShare EnterpriseLearn More

    Sign up to get full access to all the companiesMake informed product decisions

    What tools integrate with Pig?
    What tools integrate with Talend?
    What are some alternatives to Pig and Talend?
    Capybara
    Capybara helps you test web applications by simulating how a real user would interact with your app. It is agnostic about the driver running your tests and comes with Rack::Test and Selenium support built in. WebKit is supported through an external gem.
    Apache Spark
    Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.
    Splunk
    It provides the leading platform for Operational Intelligence. Customers use it to search, monitor, analyze and visualize machine data.
    Amazon Athena
    Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
    Apache Flink
    Apache Flink is an open source system for fast and versatile data analytics in clusters. Flink supports batch and streaming analytics, in one system. Analytical programs can be written in concise and elegant APIs in Java and Scala.
    See all alternatives