What is TensorFlow?
Who uses TensorFlow?
Why developers like TensorFlow?
Here are some stack decisions, common use cases and reviews by companies and developers who chose TensorFlow in their tech stack.
Google Analytics is a great tool to analyze your traffic. To debug our software and ask questions, we love to use Postman and Stack Overflow. Google Drive helps our team to share documents. We're able to build our great products through the APIs by Google Maps, CloudFlare, Stripe, PayPal, Twilio, Let's Encrypt, and TensorFlow.
Why we built an open source, distributed training framework for TensorFlow , Keras , and PyTorch:
At Uber, we apply deep learning across our business; from self-driving research to trip forecasting and fraud prevention, deep learning enables our engineers and data scientists to create better experiences for our users.
TensorFlow has become a preferred deep learning library at Uber for a variety of reasons. To start, the framework is one of the most widely used open source frameworks for deep learning, which makes it easy to onboard new users. It also combines high performance with an ability to tinker with low-level model details—for instance, we can use both high-level APIs, such as Keras, and implement our own custom operators using NVIDIA’s CUDA toolkit.
Uber has introduced Michelangelo (https://eng.uber.com/michelangelo/), an internal ML-as-a-service platform that democratizes machine learning and makes it easy to build and deploy these systems at scale. In this article, we pull back the curtain on Horovod, an open source component of Michelangelo’s deep learning toolkit which makes it easier to start—and speed up—distributed deep learning projects with TensorFlow:
(Direct GitHub repo: https://github.com/uber/horovod)
Deep learning jobs require a unique challenge versus other jobs that run across multiple GPUs: they need every node to stay up and running till the job is complete, which is why Uber uses gang scheduling.
Gang scheduling (an optimization algorithm) means that for a cluster computing job to run, all the nodes have to be ready to run at the same time. This is especially useful in deep learning training, which involves constant feedback exchanged between nodes. Uber implemented gang scheduling in an Open Source framework called Horovod, to run Google’s TensorFlow machine learning software across multiple nodes.
Because they needed GPUs in upstream releases as well, Uber’s engineers chose to use Mesos containers over Docker.
The engineers at Uber used Horovod (and the TensorFlow package compatible with it) because it was easier to learn the rules of the MPI library in Horovod, than learning an entirely new system.
In mid-2015, Uber began exploring ways to scale ML across the organization, avoiding ML anti-patterns while standardizing workflows and tools. This effort led to Michelangelo.
Michelangelo consists of a mix of open source systems and components built in-house. The primary open sourced components used are HDFS, Spark, Samza, Cassandra, MLLib, XGBoost, and TensorFlow.