Apache NiFi logo

Apache NiFi

A reliable system to process and distribute data
236
483
+ 1
55

What is Apache NiFi?

An easy to use, powerful, and reliable system to process and distribute data. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic.
Apache NiFi is a tool in the Message Queue category of a tech stack.

Who uses Apache NiFi?

Companies
28 companies reportedly use Apache NiFi in their tech stacks, including Hepsiburada, ViaVarejo, and Postclick.

Developers
205 developers on StackShare have stated that they use Apache NiFi.

Apache NiFi Integrations

MongoDB, Amazon S3, Kafka, Amazon SQS, and Linux are some of the popular tools that integrate with Apache NiFi. Here's a list of all 9 tools that integrate with Apache NiFi.
Pros of Apache NiFi
14
Visual Data Flows using Directed Acyclic Graphs (DAGs)
7
Free (Open Source)
5
Simple-to-use
4
Reactive with back-pressure
4
Scalable horizontally as well as vertically
3
Bi-directional channels
3
Fast prototyping
2
Data provenance
2
Built-in graphical user interface
2
End-to-end security between all nodes
2
Can handle messages up to gigabytes in size
1
Hbase support
1
Kudu support
1
Hive support
1
Slack integration
1
Support for custom Processor in Java
1
Lot of articles
1
Lots of documentation
Decisions about Apache NiFi

Here are some stack decisions, common use cases and reviews by companies and developers who chose Apache NiFi in their tech stack.

I am looking for the best tool to orchestrate #ETL workflows in non-Hadoop environments, mainly for regression testing use cases. Would Airflow or Apache NiFi be a good fit for this purpose?

For example, I want to run an Informatica ETL job and then run an SQL task as a dependency, followed by another task from Jira. What tool is best suited to set up such a pipeline?

See more
John Calandra
Data Manager at The Garrett Group · | 6 upvotes · 52.7K views

There is a question coming... I am using Oracle VirtualBox to spawn 3 Ubuntu Linux virtual machines (VM). VM1 is being used as a data lake - just a place to store flat files. VM2 hosts Apache NiFi. VM3 hosts PostgreSQL. I have built a NiFi pipeline that reads flat files on VM1 and then pipes the data over to and inserts it into the Postgresql database. I left this setup alone for a while, and then something hiccupped on VM3, and I had to rebuild it. Now I cannot make a remote connection to Postgresql on VM3. I was using pgAdmin3 on VM3, but it kept throwing errors - I found out it went end-of-life in 2018 and uninstalled it. pgAdmin4 is out, but for some reason, I cannot get the APT utility to find/install it. I am trying to figure out the pgAdmin4 install problem and looking for a good alternative for pgAdmin4 that I can use to diagnose the remote database connection problem. Does anyone have any suggestions? Thanks in advance.

See more

Apache NiFi's Features

  • Web-based user interface
  • Highly configurable
  • Data Provenance
  • Designed for extension
  • Secure

Apache NiFi Alternatives & Comparisons

What are some alternatives to Apache NiFi?
Kafka
Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.
Apache Storm
Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.
Logstash
Logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use (like, for searching). If you store them in Elasticsearch, you can view and analyze them with Kibana.
Apache Camel
An open source Java framework that focuses on making integration easier and more accessible to developers.
Apache Spark
Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.
See all alternatives

Apache NiFi's Followers
483 developers follow Apache NiFi to keep up with related blogs and decisions.