Apache NiFi

Apache NiFi

Application and Data / Data Stores / Message Queue
Data Manager at The Garrett Group·
Needs advice
on
LinuxLinuxpgAdminpgAdmin
and
PostgreSQLPostgreSQL

There is a question coming... I am using Oracle VirtualBox to spawn 3 Ubuntu Linux virtual machines (VM). VM1 is being used as a data lake - just a place to store flat files. VM2 hosts Apache NiFi. VM3 hosts PostgreSQL. I have built a NiFi pipeline that reads flat files on VM1 and then pipes the data over to and inserts it into the Postgresql database. I left this setup alone for a while, and then something hiccupped on VM3, and I had to rebuild it. Now I cannot make a remote connection to Postgresql on VM3. I was using pgAdmin3 on VM3, but it kept throwing errors - I found out it went end-of-life in 2018 and uninstalled it. pgAdmin4 is out, but for some reason, I cannot get the APT utility to find/install it. I am trying to figure out the pgAdmin4 install problem and looking for a good alternative for pgAdmin4 that I can use to diagnose the remote database connection problem. Does anyone have any suggestions? Thanks in advance.

READ MORE
8 upvotes·374.2K views
Replies (1)
Recommends
on
phpPgAdmin

If you want an alternative to pgAdmin there is phpPgAdmin and it's also Open Source like pgAdmin, which is just like phpmyadmin which works for MariaDB and MySQL. I have not used it as I run pgAdmin4 in a Docker container.

Difference here is if you like to SQL edit, then pgAdmin is the best solution as it provides syntax highlighting whereas phpPgAdmin does not. Hope this is useful enough.

EDIT: Otherwise a good idea is to read on the differences between the two. Though I believe it as a personal preference.

READ MORE
5 upvotes·805 views
Needs advice
on
AirflowAirflow
and
Apache NiFiApache NiFi

I am looking for the best tool to orchestrate #ETL workflows in non-Hadoop environments, mainly for regression testing use cases. Would Airflow or Apache NiFi be a good fit for this purpose?

For example, I want to run an Informatica ETL job and then run an SQL task as a dependency, followed by another task from Jira. What tool is best suited to set up such a pipeline?

READ MORE
4 upvotes·667.7K views
Replies (2)
Recommends
on
Airflow

I have been using Airflow for more than 2 years now and haven't thought about moving to any other platform. Coming back to your requirements, Airflow fits pretty well. 1. It has an excellent way to manage dependent tasks using DAG (Direct Acyclic Graph), You can create a DAG with tasks and manage which task is dependent on which and Airflow takes care of running it or not running a task in case the parent task fails. 2. Integrations - The airflow community has implemented various integration to different cloud services, to Hadoop, spark a and as well as Jira. Though it doesn't have in-built integration for Informatica you can also run your own service in Airflow as a task (which can handle all Informatica related operations).

  1. It's very easy to find/monitor and manage Jobs/Pipelines as Airflow provides a great consolidated UI.
READ MORE
5 upvotes·20.9K views
Sales Executive at Astronomer·
Recommends
on
Airflow

Hey Sathya! With Airflow, you are able to create custom hooks and operators to trigger various types of jobs. There may be ones that exist already for informatica, but I am unsure. Would be happy to connect to discuss further if you are interested. josh@astronomer.io

READ MORE
20.7K views