We are trying to standardise DevOps across both ML (model selection and deployment) and regular software. Want to minimise the number of tools we have to learn. Also want a scalable solution which is easy enough to start small - eg. on a powerful laptop and eventually be deployed at scale. MLflow vs Kubernetes (Kubeflow)?
MLflow
Can you please advise which one to choose FastText Or Gensim, in terms of:
- Operability with ML Ops tools such as MLflow, Kubeflow, etc.
- Performance
- Customization of Intermediate steps
- FastText and Gensim both have the same underlying libraries
- Use cases each one tries to solve
- Unsupervised Vs Supervised dimensions
- Ease of Use.
Please mention any other points that I may have missed here.
I already use DVC to keep track and store my datasets in my machine learning pipeline. I have also started to use MLflow to keep track of my experiments. However, I still don't know whether to use DVC for my model files or I use the MLflow artifact store for this purpose. Or maybe these two serve different purposes, and it may be good to do both! Can anyone help, please?
I personally think that MLflow does a great job at experiment tracking, but If you've already set dvc and you're already using it, it makes more sense to me to keep data, code and model in the context of the same commit, under the same roof, than having some dangling files in another system that requires you to track down a commit on the ui, and then get a link to the model manually. Using artifact logging is very useful if you need to see for example generated photos in real time, and stop training in the middle, or if you don't already have a data versioning system set up. By the way DAGsHub let's you combine both very easily.
Hey Hamid - I'm on the DVC team and I'm glad I randomly came across this.
We actually just released an experiment tracking tool that integrates seamlessly with DVC. If you want to see how it works please reach out! mikem@iterative.ai