Chose
PythonPython
over
ScalaScala

I am working in the domain of big data and machine learning. I am helping companies with bringing their machine learning models to the production. In many projects there is a tendency to port Python, PySpark code to Scala and Scala Spark.

This yields to longer time to market and a lot of mistakes due to necessity to understand and re-write the code. Also many libraries/apis that data scientists/machine learning practitioners use are not available in jvm ecosystem.

Simply, refactoring (if necessary) and organising the code of the data scientists by following best practices of software development is less error prone and faster comparing to re-write in Scala.

Pipeline orchestration tools such as Luigi/Airflow is python native and fits well to this picture.

I have heard some arguments against Python such as, it is slow, or it is hard to maintain due to its dynamically typed language. However cost/benefit of time consumed porting python code to java/scala alone would be enough as a counter-argument. ML pipelines rarerly contains a lot of code (if that is not the case, such as complex domain and significant amount of code, then scala would be a better fit).

In terms of performance, I did not see any issues with Python. It is not the fastest runtime around but ML applications are rarely time-critical (majority of them is batch based).

I still prefer Scala for developing APIs and for applications where the domain contains complex logic.

READ LESS
15 upvotes·189.9K views
Avatar of Nicholas Tesla