What are some alternatives to Kubeflow?

What is Kubeflow and what are its top alternatives?

The Kubeflow project is dedicated to making Machine Learning on Kubernetes easy, portable and scalable by providing a straightforward way for spinning up best of breed OSS solutions.

Kubeflow is a tool in the Machine Learning Tools category of a tech stack.

Kubeflow is an open source tool with GitHub stars and GitHub forks. Here’s a link to Kubeflow's open source repository on GitHub

Top Alternatives to Kubeflow

TensorFlow
TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. ...
Apache Spark
Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning. ...
MLflow
MLflow is an open source platform for managing the end-to-end machine learning lifecycle. ...
Airflow
Use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command lines utilities makes performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress and troubleshoot issues when needed. ...
Polyaxon
An enterprise-grade open source platform for building, training, and monitoring large scale deep learning applications. ...
Argo
Argo is an open source container-native workflow engine for getting work done on Kubernetes. Argo is implemented as a Kubernetes CRD (Custom Resource Definition). ...
Kubernetes
Kubernetes is an open source orchestration system for Docker containers. It handles scheduling onto nodes in a compute cluster and actively manages workloads to ensure that their state matches the users declared intentions. ...
Pachyderm
Pachyderm is an open source MapReduce engine that uses Docker containers for distributed computations. ...

Kubeflow alternatives & related posts

TensorFlow

3.8K

106

Open Source Software Library for Machine Intelligence

Stacks3.8K

Votes106

PROS OF TENSORFLOW

32
High Performance
19
Connect Research and Production
16
Deep Flexibility
12
Auto-Differentiation
11
True Portability
6
Easy to use
5
High level abstraction
5
Powerful

CONS OF TENSORFLOW

9
Hard
6
Hard to debug
2
Documentation not very helpful

COMPARE

Compare TensorFlow vs Kubeflow

Apache Spark

140

Fast and general engine for large-scale data processing

Stacks3K

Votes140

PROS OF APACHE SPARK

61
Open-source
48
Fast and Flexible
8
One platform for every big data problem
8
Great for distributed SQL like applications
6
Easy to install and to use
3
Works well for most Datascience usecases
2
Interactive Query
2
Machine learning libratimery, Streaming in real
2
In memory Computation

CONS OF APACHE SPARK

4
Speed

COMPARE

Compare Apache Spark vs Kubeflow

related Apache Spark posts

Eric Colson

Chief Algorithms Officer at Stitch Fix · Apr 10, 2019 | 21 upvotes · 6.1M views

Shared insights

The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. We store data in an Amazon S3 based data warehouse. Apache Spark on Yarn is our tool of choice for data movement and #ETL. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.

Beyond data movement and ETL, most #ML centric jobs (e.g. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see https://github.com/stitchfix/flotilla-os).

At Stitch Fix, algorithmic integrations are pervasive across the business. We have dozens of data products actively integrated systems. That requires serving layer that is robust, agile, flexible, and allows for self-service. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. This provides our data scientist a one-click method of getting from their algorithms to production. We then integrate those deployments into a service mesh, which allows us to A/B test various implementations in our product.

For more info:

Our Algorithms Tour: https://algorithms-tour.stitchfix.com/
Our blog: https://multithreaded.stitchfix.com/blog/
Careers: https://multithreaded.stitchfix.com/careers/

#DataScience #DataStack #Data

Stitch Fix Algorithms Tour

Patrick Sun

Software Engineer at Stitch Fix · Sep 13, 2018 | 10 upvotes · 62.4K views

Shared insights

Elasticsearch +1 more

Stitch Fix

As a frontend engineer on the Algorithms & Analytics team at Stitch Fix, I work with data scientists to develop applications and visualizations to help our internal business partners make data-driven decisions. I envisioned a platform that would assist data scientists in the data exploration process, allowing them to visually explore and rapidly iterate through their assumptions, then share their insights with others. This would align with our team's philosophy of having engineers "deploy platforms, services, abstractions, and frameworks that allow the data scientists to conceive of, develop, and deploy their ideas with autonomy", and solve the pain of data exploration.

The final product, code-named Dora, is built with React, Redux.js and Victory, backed by Elasticsearch to enable fast and iterative data exploration, and uses Apache Spark to move data from our Amazon S3 data warehouse into the Elasticsearch cluster.

Building a Data Exploration Tool with React, Redux, Victory, and Elasticsearch - Stitch Fix Tech Stack | StackShare