Why developers like Dask

What is Dask?

It is a versatile tool that supports a variety of workloads. It is composed of two parts: Dynamic task scheduling optimized for computation. This is similar to Airflow, Luigi, Celery, or Make, but optimized for interactive computational workloads. Big Data collections like parallel arrays, dataframes, and lists that extend common interfaces like NumPy, Pandas, or Python iterators to larger-than-memory or distributed environments. These parallel collections run on top of dynamic task schedulers.

Dask is a tool in the Data Science Tools category of a tech stack.

Who uses Dask?

Companies

12 companies reportedly use Dask in their tech stacks, including Oxylabs, Corca's Tech Stack, and SMARTTechStack.

Oxylabs

Corca's Tech Stack

SMARTTechStack

Data Science

Kinderboerderij ...

Red Hat BIDS

Gitential

Sypht

Clarity AI Data ...

Developers

86 developers on StackShare have stated that they use Dask.

Wishlist

My Stack

stack fb

Dask Integrations

Python, NumPy, Pandas, PySpark, and OpenRefine are some of the popular tools that integrate with Dask. Here's a list of all 9 tools that integrate with Dask.

Python

NumPy

Pandas

PySpark

OpenRefine

TileDB

Orchest

Bumblebee

Flyte

Jobs that mention Dask as a desired skillset

Staff Software Engineer, ML Training

San Francisco, CA, US; , CA, US

View Job Details

+12

Staff Software Engineer, ML Training

San Francisco, CA, US; , CA, US

View Job Details

+12

See all jobs

Dask's Features

Supports a variety of workloads
Dynamic task scheduling
Trivial to set up and run on a laptop in a single process
Runs resiliently on clusters with 1000s of cores

Dask Alternatives & Comparisons

What are some alternatives to Dask?

Apache Spark

Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

Pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more.

PySpark

It is the collaboration of Apache Spark and Python. it is a Python API for Spark that lets you harness the simplicity of Python and the power of Apache Spark in order to tame Big Data.

Celery

Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.

Airflow

Use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command lines utilities makes performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress and troubleshoot issues when needed.

See all alternatives

Related Comparisons