What is Scala?
Who uses Scala?
Why developers like Scala?
Here are some stack decisions, common use cases and reviews by companies and developers who chose Scala in their tech stack.
Lumosity is home to the world's largest cognitive training database, a responsibility we take seriously. For most of the company's history, our analysis of user behavior and training data has been powered by an event stream--first a simple Node.js pub/sub app, then a heavyweight Ruby app with stronger durability. Both supported decent throughput and latency, but they lacked some major features supported by existing open-source alternatives: replaying existing messages (also lacking in most message queue-based solutions), scaling out many different readers for the same stream, the ability to leverage existing solutions for reading and writing, and possibly most importantly: the ability to hire someone externally who already had expertise.
We ultimately migrated to Kafka in early- to mid-2016, citing both industry trends in companies we'd talked to with similar durability and throughput needs, the extremely strong documentation and community. We pored over Kyle Kingsbury's Jepsen post (https://aphyr.com/posts/293-jepsen-Kafka), as well as Jay Kreps' follow-up (http://blog.empathybox.com/post/62279088548/a-few-notes-on-kafka-and-jepsen), talked at length with Confluent folks and community members, and still wound up running parallel systems for quite a long time, but ultimately, we've been very, very happy. Understanding the internals and proper levers takes some commitment, but it's taken very little maintenance once configured. Since then, the Confluent Platform community has grown and grown; we've gone from doing most development using custom Scala consumers and producers to being 60/40 Kafka Streams/Connects.
We originally looked into Storm / Heron , and we'd moved on from Redis pub/sub. Heron looks great, but we already had a programming model across services that was more akin to consuming a message consumers than required a topology of bolts, etc. Heron also had just come out while we were starting to migrate things, and the community momentum and direction of Kafka felt more substantial than the older Storm. If we were to start the process over again today, we might check out Pulsar , although the ecosystem is much younger.
To find out more, read our 2017 engineering blog post about the migration!
Onedot is building an automated data preparation service using probabilistic and statistical methods including artificial intelligence (AI). From the beginning, having a stable foundation while at the same time being able to iterate quickly was very important to us. Due to the nature of compute workloads we face, the decision for a functional programming paradigm and a scalable cluster model was a no-brainer. We started playing with Apache Spark very early on, when the platform was still in its infancy. As a storage backend, we first used Cassandra, but found out that it was not the optimal choice for our workloads (lots of rather smallish datasets, data pipelines with considerable complexity, etc.). In the end, we migrated dataset storage to Amazon S3 which proved to be much more adequate to our case. In the frontend, we bet on more traditional frameworks like React/Redux.js, Blueprint and a number of common npm packages of our universe. Because of the very positive experience with Scala (in particular the ability to write things very expressively, use immutability across the board, etc.) we settled with TypeScript in the frontend. In our opinion, a very good decision. Nowadays, transpiling is a common thing, so we thought why not introduce the same type-safety and mathematical rigour to the user interface?
Why I am using Haskell in my free time?
I have 3 reasons for it. I am looking for:
Improve functional programming skill.
Improve problem-solving skill.
Laziness and mathematical abstractions behind Haskell makes it a wonderful language.
It is Pure functional, it helps me to write better Scala code.
Highly expressive language gives elegant ways to solve coding puzzle.
Some may wonder why did we choose Grails ? Really good question :) We spent quite some time to evaluate what framework to go with and the battle was between Play Scala and Grails ( Groovy ). We have enough experience with both and, to be honest, I absolutely in love with Scala; however, the tipping point for us was the potential speed of development. Grails allows much faster development pace than Play , and as of right now this is the most important parameter. We might convert later though. Also, worth mentioning, by default Grails comes with Gradle as a build tool, so why change?
Slack’s data team works to “provide an ecosystem to help people in the company quickly and easily answer questions about usage, so they can make better and data informed decisions.” To achieve that goal, that rely on a complex data pipeline.
An in-house tool call Sqooper scrapes MySQL backups and pipe them to S3. Job queue and log data is sent to Kafka then persisted to S3 using an open source tool called Secor, which was created by Pinterest.
For compute, Amazon’s Elastic MapReduce (EMR) creates clusters preconfigured for Presto, Hive, and Spark.
Presto is then used for ad-hoc questions, validating data assumptions, exploring smaller datasets, and creating visualizations for some internal tools. Hive is used for larger data sets or longer time series data, and Spark allows teams to write efficient and robust batch and aggregation jobs. Most of the Spark pipeline is written in Scala.
Thrift binds all of these engines together with a typed schema and structured data.
Finally, the Hive Metastore serves as the ground truth for all data and its schema.
Nearly our entire server codebase is written in Scala (if you haven't heard of it, it's a programming language that is basically what you would get if Java + ML had a baby). This has worked out super well. It enables us to write concise easy to deal with code that is typechecked at compile time. It's also been a big help with recruiting. Scala