Avatar of akarsh3007
Recommends
Airflow

I have been using Airflow for more than 2 years now and haven't thought about moving to any other platform. Coming back to your requirements, Airflow fits pretty well. 1. It has an excellent way to manage dependent tasks using DAG (Direct Acyclic Graph), You can create a DAG with tasks and manage which task is dependent on which and Airflow takes care of running it or not running a task in case the parent task fails. 2. Integrations - The airflow community has implemented various integration to different cloud services, to Hadoop, spark a and as well as Jira. Though it doesn't have in-built integration for Informatica you can also run your own service in Airflow as a task (which can handle all Informatica related operations).

  1. It's very easy to find/monitor and manage Jobs/Pipelines as Airflow provides a great consolidated UI.
READ MORE
5 upvotes11.3K views
Recommends
Druid

Druid is amazing for this use case and is a cloud-native solution that can be deployed on any cloud infrastructure or on Kubernetes. - Easy to scale horizontally - Column Oriented Database - SQL to query data - Streaming and Batch Ingestion - Native search indexes It has feature to work as TimeSeriesDB, Datawarehouse, and has Time-optimized partitioning.

READ MORE
4 upvotes35.1K views
Recommends
Druid

Druid Could be an amazing solution for your use case, My understanding, and the assumption is you are looking to export your data from MariaDB for Analytical workload. It can be used for time series database as well as a data warehouse and can be scaled horizontally once your data increases. It's pretty easy to set up on any environment (Cloud, Kubernetes, or Self-hosted nix system). Some important features which make it a perfect solution for your use case. 1. It can do streaming ingestion (Kafka, Kinesis) as well as batch ingestion (Files from Local & Cloud Storage or Databases like MySQL, Postgres). In your case MariaDB (which has the same drivers to MySQL) 2. Columnar Database, So you can query just the fields which are required, and that runs your query faster automatically. 3. Druid intelligently partitions data based on time and time-based queries are significantly faster than traditional databases. 4. Scale up or down by just adding or removing servers, and Druid automatically rebalances. Fault-tolerant architecture routes around server failures 5. Gives ana amazing centralized UI to manage data sources, query, tasks.

READ MORE
4 upvotes1 comment30K views
pionell
pionell
September 23rd 2020 at 7:58AM

thanks, looking into this!

Reply
Recommends
Travis CI

I use Travis CI because of various reasons - 1. Cloud based system so no dedicated server required, and you do not need to administrate it. 2. Easy YAML configuration. 3. Supports Major Programming Languages. 4. Support of build matrix 6. Supports AWS, Azure, Docker, Heroku, Google Cloud, Github Pages, PyPi and lot more. 7. Slack Notifications.

READ MORE
3 upvotes114.8K views
Recommends
Redis

Redis can do the same jobs as Memcached, ECache or any other im-memory caching tool and can do them better. - Redis can act as a cache as well. It can store key/value pairs too. In Redis, they can even be up to 512MB. - Redis is better documented than Memcached and ECached. - Supports many data types. (Strings, Hashes, Lists, Sets, Sorted Sets, Geo, Bitmap and HyperLogLog) - Redis provides pub/sub as well. - Redis can be scaled easily horizontally if needed using its own tool Redis Sentinel. - Wide variety of client libraries in almost every language.

READ MORE
3 upvotes2.7K views