PySparkĀ vsĀ Scala

Get Advice Icon

Need advice about which tool to choose?Ask the StackShare community!

PySpark
PySpark

27
17
+ 1
0
Scala
Scala

3.1K
2.3K
+ 1
1.4K
Add tool

PySpark vs Scala: What are the differences?

PySpark: The Python API for Spark. It is the collaboration of Apache Spark and Python. it is a Python API for Spark that lets you harness the simplicity of Python and the power of Apache Spark in order to tame Big Data; Scala: A pure-bred object-oriented language that runs on the JVM. Scala is an acronym for ā€œScalable Languageā€. This means that Scala grows with you. You can play with it by typing one-line expressions and observing the results. But you can also rely on it for large mission critical systems, as many companies, including Twitter, LinkedIn, or Intel do. To some, Scala feels like a scripting language. Its syntax is concise and low ceremony; its types get out of the way because the compiler can infer them.

PySpark can be classified as a tool in the "Data Science Tools" category, while Scala is grouped under "Languages".

Scala is an open source tool with 11.9K GitHub stars and 2.76K GitHub forks. Here's a link to Scala's open source repository on GitHub.

According to the StackShare community, Scala has a broader approval, being mentioned in 557 company stacks & 1895 developers stacks; compared to PySpark, which is listed in 8 company stacks and 6 developer stacks.

- No public GitHub repository available -

What is PySpark?

It is the collaboration of Apache Spark and Python. it is a Python API for Spark that lets you harness the simplicity of Python and the power of Apache Spark in order to tame Big Data.

What is Scala?

Scala is an acronym for ā€œScalable Languageā€. This means that Scala grows with you. You can play with it by typing one-line expressions and observing the results. But you can also rely on it for large mission critical systems, as many companies, including Twitter, LinkedIn, or Intel do. To some, Scala feels like a scripting language. Its syntax is concise and low ceremony; its types get out of the way because the compiler can infer them.
Get Advice Icon

Need advice about which tool to choose?Ask the StackShare community!

Why do developers choose PySpark?
Why do developers choose Scala?
    Be the first to leave a pro

    Sign up to add, upvote and see more prosMake informed product decisions

      Be the first to leave a con
      What companies use PySpark?
      What companies use Scala?

      Sign up to get full access to all the companiesMake informed product decisions

      What tools integrate with PySpark?
      What tools integrate with Scala?

      Sign up to get full access to all the tool integrationsMake informed product decisions

      What are some alternatives to PySpark and Scala?
      Python
      Python is a general purpose programming language created by Guido Van Rossum. Python is most praised for its elegant syntax and readable code, if you are just beginning your programming career python suits you best.
      Apache Spark
      Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.
      Pandas
      Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more.
      Hadoop
      The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
      NumPy
      Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.
      See all alternatives
      Decisions about PySpark and Scala
      Marc Bollinger
      Marc Bollinger
      Infra & Data Eng Manager at Lumosity Ā· | 4 upvotes Ā· 81.6K views
      atLumosityLumosity
      Node.js
      Node.js
      Ruby
      Ruby
      Kafka
      Kafka
      Scala
      Scala
      Apache Storm
      Apache Storm
      Heron
      Heron
      Redis
      Redis
      Pulsar
      Pulsar

      Lumosity is home to the world's largest cognitive training database, a responsibility we take seriously. For most of the company's history, our analysis of user behavior and training data has been powered by an event stream--first a simple Node.js pub/sub app, then a heavyweight Ruby app with stronger durability. Both supported decent throughput and latency, but they lacked some major features supported by existing open-source alternatives: replaying existing messages (also lacking in most message queue-based solutions), scaling out many different readers for the same stream, the ability to leverage existing solutions for reading and writing, and possibly most importantly: the ability to hire someone externally who already had expertise.

      We ultimately migrated to Kafka in early- to mid-2016, citing both industry trends in companies we'd talked to with similar durability and throughput needs, the extremely strong documentation and community. We pored over Kyle Kingsbury's Jepsen post (https://aphyr.com/posts/293-jepsen-Kafka), as well as Jay Kreps' follow-up (http://blog.empathybox.com/post/62279088548/a-few-notes-on-kafka-and-jepsen), talked at length with Confluent folks and community members, and still wound up running parallel systems for quite a long time, but ultimately, we've been very, very happy. Understanding the internals and proper levers takes some commitment, but it's taken very little maintenance once configured. Since then, the Confluent Platform community has grown and grown; we've gone from doing most development using custom Scala consumers and producers to being 60/40 Kafka Streams/Connects.

      We originally looked into Storm / Heron , and we'd moved on from Redis pub/sub. Heron looks great, but we already had a programming model across services that was more akin to consuming a message consumers than required a topology of bolts, etc. Heron also had just come out while we were starting to migrate things, and the community momentum and direction of Kafka felt more substantial than the older Storm. If we were to start the process over again today, we might check out Pulsar , although the ecosystem is much younger.

      To find out more, read our 2017 engineering blog post about the migration!

      See more
      Alex A
      Alex A
      Founder at PRIZ Guru Ā· | 3 upvotes Ā· 63.3K views
      atPRIZ GuruPRIZ Guru
      Grails
      Grails
      Play
      Play
      Scala
      Scala
      Groovy
      Groovy
      Gradle
      Gradle

      Some may wonder why did we choose Grails ? Really good question :) We spent quite some time to evaluate what framework to go with and the battle was between Play Scala and Grails ( Groovy ). We have enough experience with both and, to be honest, I absolutely in love with Scala; however, the tipping point for us was the potential speed of development. Grails allows much faster development pace than Play , and as of right now this is the most important parameter. We might convert later though. Also, worth mentioning, by default Grails comes with Gradle as a build tool, so why change?

      See more
      Vadim Bakaev
      Vadim Bakaev
      Haskell
      Haskell
      Scala
      Scala

      Why I am using Haskell in my free time?

      I have 3 reasons for it. I am looking for:

      Fun.

      Improve functional programming skill.

      Improve problem-solving skill.

      Laziness and mathematical abstractions behind Haskell makes it a wonderful language.

      It is Pure functional, it helps me to write better Scala code.

      Highly expressive language gives elegant ways to solve coding puzzle.

      See more
      Interest over time
      Reviews of PySpark and Scala
      No reviews found
      How developers use PySpark and Scala
      Avatar of datapile
      datapile uses ScalaScala

      Scala is the God of languages. A legend. The Mount Rushmore of hybrid OO/functional languages is Scala's face four times over.

      Ok, honestly, we love Scala. We love(d) Java (and it's parents C and C++), and we love(d) all the languages that borrowed cough stole cough from Java over the years such as Groovy, Clojure, and C#.

      It may not be perfect (it totally is, but since programming languages don't have egos of their own, we don't want to paint it too bright), but it is awesome. It runs on the JVM, you can utilize Spring, it works great for data processing (which is sorta kinda the thing we do here, folks), and it just makes sense at all levels.

      If you don't like Scala, we feel sorry for the projects that are suffering due to your choices, meanwhile we are using Scala to write everything from JavaScript, CSS, SQL, and JSON directly within itself (go figure), so in the end no one will know the beauty of this powerhouse language (except for our engineers, of course).

      Avatar of Foursquare
      Foursquare uses ScalaScala

      Nearly our entire server codebase is written in Scala (if you haven't heard of it, it's a programming language that is basically what you would get if Java + ML had a baby). This has worked out super well. It enables us to write concise easy to deal with code that is typechecked at compile time. It's also been a big help with recruiting.

      Avatar of papaver
      papaver uses ScalaScala

      worked with scala for around 2 years. really enjoyed the language and getting back into the world of functional. unfortunately the community is heavily fragmented and the language itself broken and inconsistent. that with the various factions involved made it a put of for long term investment.

      Avatar of Stanislaus Madueke
      Stanislaus Madueke uses ScalaScala

      Scala, Akka and Spray (which became Akka-Http) provided the building blocks for the menu service.
      Akka's actors and finite-state machine were a natural way to model a USSD menu (a series of stateful interactions between a subscriber and the USSD gateway).

      Avatar of Giovanni Candido da Silva
      Giovanni Candido da Silva uses ScalaScala

      Replaces entirely the Java Language to build a much more expressive and powerful code on the backend, while leveraging at the same time the Java Platform Tools and Frameworks, is a mixture of old and mature with new and sexy.

      How much does PySpark cost?
      How much does Scala cost?
      Pricing unavailable
      Pricing unavailable
      News about PySpark
      More news