CEO, Founder at DQOps·

We have chosen a mix of Java and Python for building an open source data observability tool. The application can work as a standalone command line tool with a rich shell interface (using even command completion). The Java ecosystem is more mature when it comes to connectivity to various databases using JDBC. Also picocli with jline3 let us make a very dynamic shell interface with command completion. The definitions of data quality checks that should be executed are defined in YAML files, backed by a YAML (in fact JSON) schema files. Our YAML files can be edited in Visual Studio Code (and other code editors) with support of the code completion. It is possible because all the data model is defined as pure Java classes for which we are generating a YAML/JSON schema. There is still place for Python because it is very popular in the database space. We are simply starting a Python interpreter in the background (from a Java code). Python is used to evaluate validation rules (defined as Python functions) and render SQL queries from Jinja2 templates.

READ LESS
Getting started with DQO.ai - dqo.ai (dqo.ai)
16 upvotes·95.1K views
Avatar of Piotr Czarnas

Piotr Czarnas

CEO, Founder at DQOps