PySpark vs Scala: What are the differences?
PySpark: The Python API for Spark. It is the collaboration of Apache Spark and Python. it is a Python API for Spark that lets you harness the simplicity of Python and the power of Apache Spark in order to tame Big Data; Scala: A pure-bred object-oriented language that runs on the JVM. Scala is an acronym for “Scalable Language”. This means that Scala grows with you. You can play with it by typing one-line expressions and observing the results. But you can also rely on it for large mission critical systems, as many companies, including Twitter, LinkedIn, or Intel do. To some, Scala feels like a scripting language. Its syntax is concise and low ceremony; its types get out of the way because the compiler can infer them.
PySpark can be classified as a tool in the "Data Science Tools" category, while Scala is grouped under "Languages".
Scala is an open source tool with 11.9K GitHub stars and 2.76K GitHub forks. Here's a link to Scala's open source repository on GitHub.
According to the StackShare community, Scala has a broader approval, being mentioned in 557 company stacks & 1895 developers stacks; compared to PySpark, which is listed in 8 company stacks and 6 developer stacks.
What is PySpark?
What is Scala?
Need advice about which tool to choose?Ask the StackShare community!
Why do developers choose PySpark?
Sign up to add, upvote and see more prosMake informed product decisions
What are the cons of using PySpark?
Sign up to get full access to all the companiesMake informed product decisions
Sign up to get full access to all the tool integrationsMake informed product decisions
Lumosity is home to the world's largest cognitive training database, a responsibility we take seriously. For most of the company's history, our analysis of user behavior and training data has been powered by an event stream--first a simple Node.js pub/sub app, then a heavyweight Ruby app with stronger durability. Both supported decent throughput and latency, but they lacked some major features supported by existing open-source alternatives: replaying existing messages (also lacking in most message queue-based solutions), scaling out many different readers for the same stream, the ability to leverage existing solutions for reading and writing, and possibly most importantly: the ability to hire someone externally who already had expertise.
We ultimately migrated to Kafka in early- to mid-2016, citing both industry trends in companies we'd talked to with similar durability and throughput needs, the extremely strong documentation and community. We pored over Kyle Kingsbury's Jepsen post (https://aphyr.com/posts/293-jepsen-Kafka), as well as Jay Kreps' follow-up (http://blog.empathybox.com/post/62279088548/a-few-notes-on-kafka-and-jepsen), talked at length with Confluent folks and community members, and still wound up running parallel systems for quite a long time, but ultimately, we've been very, very happy. Understanding the internals and proper levers takes some commitment, but it's taken very little maintenance once configured. Since then, the Confluent Platform community has grown and grown; we've gone from doing most development using custom Scala consumers and producers to being 60/40 Kafka Streams/Connects.
We originally looked into Storm / Heron , and we'd moved on from Redis pub/sub. Heron looks great, but we already had a programming model across services that was more akin to consuming a message consumers than required a topology of bolts, etc. Heron also had just come out while we were starting to migrate things, and the community momentum and direction of Kafka felt more substantial than the older Storm. If we were to start the process over again today, we might check out Pulsar , although the ecosystem is much younger.
To find out more, read our 2017 engineering blog post about the migration!
Some may wonder why did we choose Grails ? Really good question :) We spent quite some time to evaluate what framework to go with and the battle was between Play Scala and Grails ( Groovy ). We have enough experience with both and, to be honest, I absolutely in love with Scala; however, the tipping point for us was the potential speed of development. Grails allows much faster development pace than Play , and as of right now this is the most important parameter. We might convert later though. Also, worth mentioning, by default Grails comes with Gradle as a build tool, so why change?
Why I am using Haskell in my free time?
I have 3 reasons for it. I am looking for:
Improve functional programming skill.
Improve problem-solving skill.
Laziness and mathematical abstractions behind Haskell makes it a wonderful language.
It is Pure functional, it helps me to write better Scala code.
Highly expressive language gives elegant ways to solve coding puzzle.
Scala is the God of languages. A legend. The Mount Rushmore of hybrid OO/functional languages is Scala's face four times over.
Ok, honestly, we love Scala. We love(d) Java (and it's parents C and C++), and we love(d) all the languages that borrowed cough stole cough from Java over the years such as Groovy, Clojure, and C#.
It may not be perfect (it totally is, but since programming languages don't have egos of their own, we don't want to paint it too bright), but it is awesome. It runs on the JVM, you can utilize Spring, it works great for data processing (which is sorta kinda the thing we do here, folks), and it just makes sense at all levels.
Nearly our entire server codebase is written in Scala (if you haven't heard of it, it's a programming language that is basically what you would get if Java + ML had a baby). This has worked out super well. It enables us to write concise easy to deal with code that is typechecked at compile time. It's also been a big help with recruiting.
worked with scala for around 2 years. really enjoyed the language and getting back into the world of functional. unfortunately the community is heavily fragmented and the language itself broken and inconsistent. that with the various factions involved made it a put of for long term investment.
Scala, Akka and Spray (which became Akka-Http) provided the building blocks for the menu service.
Akka's actors and finite-state machine were a natural way to model a USSD menu (a series of stateful interactions between a subscriber and the USSD gateway).