Databricks logo

Databricks

A unified analytics platform, powered by Apache Spark
470
724
+ 1
8

What is Databricks?

Databricks Unified Analytics Platform, from the original creators of Apache Spark™, unifies data science and engineering across the Machine Learning lifecycle from data preparation to experimentation and deployment of ML applications.
Databricks is a tool in the General Analytics category of a tech stack.

Who uses Databricks?

Companies
45 companies reportedly use Databricks in their tech stacks, including QuintoAndar, Bagelcode, and Amperity.

Developers
423 developers on StackShare have stated that they use Databricks.

Databricks Integrations

Kafka, TensorFlow, Apache Spark, Hadoop, and PyTorch are some of the popular tools that integrate with Databricks. Here's a list of all 33 tools that integrate with Databricks.
Pros of Databricks
1
Best Performances on large datasets
1
True lakehouse architecture
1
Scalability
1
Databricks doesn't get access to your data
1
Usage Based Billing
1
Security
1
Data stays in your cloud account
1
Multicloud
Decisions about Databricks

Here are some stack decisions, common use cases and reviews by companies and developers who chose Databricks in their tech stack.

We are building cloud based analytical app and most of the data for UI is supplied from SQL server to Delta lake and then from Delta Lake to Azure Cosmos DB as JSON using Databricks. So that API can send it to front-end. Sometimes we get larger documents while transforming table rows into JSONs and it exceeds 2mb limit of cosmos size. What is the best solution for replacing Cosmos DB?

See more
Vamshi Krishna
Data Engineer at Tata Consultancy Services · | 4 upvotes · 240.7K views

I have to collect different data from multiple sources and store them in a single cloud location. Then perform cleaning and transforming using PySpark, and push the end results to other applications like reporting tools, etc. What would be the best solution? I can only think of Azure Data Factory + Databricks. Are there any alternatives to #AWS services + Databricks?

See more

Databricks's Features

  • Built on Apache Spark and optimized for performance
  • Reliable and Performant Data Lakes
  • Interactive Data Science and Collaboration
  • Data Pipelines and Workflow Automation
  • End-to-End Data Security and Compliance
  • Compatible with Common Tools in the Ecosystem
  • Unparalled Support by the Leading Committers of Apache Spark

Databricks Alternatives & Comparisons

What are some alternatives to Databricks?
Snowflake
Snowflake eliminates the administration and management demands of traditional data warehouses and big data platforms. Snowflake is a true data warehouse as a service running on Amazon Web Services (AWS)—no infrastructure to manage and no knobs to turn.
Azure Databricks
Accelerate big data analytics and artificial intelligence (AI) solutions with Azure Databricks, a fast, easy and collaborative Apache Spark–based analytics service.
Domino
Use our cloud-hosted infrastructure to securely run your code on powerful hardware with a single command — without any changes to your code. If you have your own infrastructure, our Enterprise offering provides powerful, easy-to-use cluster management functionality behind your firewall.
Confluent
It is a data streaming platform based on Apache Kafka: a full-scale streaming platform, capable of not only publish-and-subscribe, but also the storage and processing of data within the stream
Apache Spark
Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.
See all alternatives

Databricks's Followers
724 developers follow Databricks to keep up with related blogs and decisions.