StackShareStackShare
Follow on
StackShare

Discover and share technology stacks from companies around the world.

Product

  • Stacks
  • Tools
  • Companies
  • Feed

Company

  • About
  • Blog
  • Contact

Legal

  • Privacy Policy
  • Terms of Service

© 2025 StackShare. All rights reserved.

API StatusChangelog
lakeFS
BylakeFSlakeFS

lakeFS

#345in Databases
Discussions0
Followers3
OverviewDiscussions

What is lakeFS?

It is an open-source data version control system for data lakes. It provides a “Git for data” platform enabling you to implement best practices from software engineering on your data lake, including branching and merging, CI/CD, and production-like dev/test environments.

lakeFS is a tool in the Databases category of a tech stack.

Key Features

Zero copy version managementAny data formats: structured, unstructured, open table, etcScales to Petabytes and millions of objects with negligible performance impactSeamless integration with all your data stack

lakeFS Pros & Cons

Pros of lakeFS

  • ✓Available On prem
  • ✓Big Data Scale
  • ✓Cloud agnostic
  • ✓Cloud agnostics
  • ✓Doesn't require local copies of the data
  • ✓Easy integration with other tools
  • ✓Easy to use
  • ✓Format agnostic
  • ✓Full reproducibility
  • ✓Highly Scalable

Cons of lakeFS

No cons listed yet.

lakeFS Alternatives & Comparisons

What are some alternatives to lakeFS?

Apache Spark

Apache Spark

Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

Splunk

Splunk

It provides the leading platform for Operational Intelligence. Customers use it to search, monitor, analyze and visualize machine data.

Apache Flink

Apache Flink

Apache Flink is an open source system for fast and versatile data analytics in clusters. Flink supports batch and streaming analytics, in one system. Analytical programs can be written in concise and elegant APIs in Java and Scala.

Amazon Athena

Amazon Athena

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

Apache Hive

Apache Hive

Hive facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage.

AWS Glue

AWS Glue

A fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics.

lakeFS Integrations

Cloudera Enterprise, Python, Kafka, Airflow, Dremio and 7 more are some of the popular tools that integrate with lakeFS. Here's a list of all 12 tools that integrate with lakeFS.

Cloudera Enterprise
Cloudera Enterprise
Python
Python
Kafka
Kafka
Airflow
Airflow
Dremio
Dremio
DuckDB
DuckDB
Apache Hive
Apache Hive
Presto
Presto
Amazon Athena
Amazon Athena
dbt
dbt
Amazon SageMaker
Amazon SageMaker
Trino
Trino

Try It

Visit Website

Adoption

On StackShare

Companies
1
A
Developers
2
EI