99
314
+ 1
0

What is Delta Lake?

An open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads.
Delta Lake is a tool in the Big Data Tools category of a tech stack.
Delta Lake is an open source tool with 7.6K GitHub stars and 1.7K GitHub forks. Here’s a link to Delta Lake's open source repository on GitHub

Who uses Delta Lake?

Companies
10 companies reportedly use Delta Lake in their tech stacks, including XTRM-Data, Peak-AI, and Compile Inc.

Developers
88 developers on StackShare have stated that they use Delta Lake.

Delta Lake Integrations

Amazon S3, Apache Spark, Hadoop, Databricks, and StreamSets are some of the popular tools that integrate with Delta Lake. Here's a list of all 8 tools that integrate with Delta Lake.
Decisions about Delta Lake

Here are some stack decisions, common use cases and reviews by companies and developers who chose Delta Lake in their tech stack.

We are building cloud based analytical app and most of the data for UI is supplied from SQL server to Delta lake and then from Delta Lake to Azure Cosmos DB as JSON using Databricks. So that API can send it to front-end. Sometimes we get larger documents while transforming table rows into JSONs and it exceeds 2mb limit of cosmos size. What is the best solution for replacing Cosmos DB?

See more

Delta Lake's Features

  • ACID Transactions
  • Scalable Metadata Handling
  • Time Travel (data versioning)
  • Open Format
  • Unified Batch and Streaming Source and Sink
  • Schema Enforcement
  • Schema Evolution
  • 100% Compatible with Apache Spark API

Delta Lake Alternatives & Comparisons

What are some alternatives to Delta Lake?
Snowflake
Snowflake eliminates the administration and management demands of traditional data warehouses and big data platforms. Snowflake is a true data warehouse as a service running on Amazon Web Services (AWS)—no infrastructure to manage and no knobs to turn.
Apache Spark
Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.
MySQL
The MySQL software delivers a very fast, multi-threaded, multi-user, and robust SQL (Structured Query Language) database server. MySQL Server is intended for mission-critical, heavy-load production systems as well as for embedding into mass-deployed software.
PostgreSQL
PostgreSQL is an advanced object-relational database management system that supports an extended subset of the SQL standard, including transactions, foreign keys, subqueries, triggers, user-defined types and functions.
MongoDB
MongoDB stores data in JSON-like documents that can vary in structure, offering a dynamic, flexible schema. MongoDB was also designed for high availability and scalability, with built-in replication and auto-sharding.
See all alternatives

Delta Lake's Followers
314 developers follow Delta Lake to keep up with related blogs and decisions.