A lightweight ETL framework with a focus on transparency and complexity reduction.
- Data integration pipelines as code: pipelines, tasks and commands are created using declarative Python code.
- PostgreSQL as a data processing engine.
- Extensive web ui. The web browser as the main tool for inspecting, running and debugging pipelines.
- GNU make semantics. Nodes depend on the completion of upstream nodes. No data dependencies or data flows.
- No in-app data processing: command line tools as the main tool for interacting with databases and data.
- Single machine pipeline execution based on Python's multiprocessing. No need for distributed task queues. Easy debugging and and output logging.
- Cost based priority queues: nodes with higher cost (based on recorded run times) are run first.