Get Advice Icon

Need advice about which tool to choose?Ask the StackShare community!

Serverless
Serverless

457
364
+ 1
17
Apache Spark
Apache Spark

1K
818
+ 1
98
Add tool

Serverless vs Apache Spark: What are the differences?

Serverless: The most widely-adopted toolkit for building serverless applications. Build applications comprised of microservices that run in response to events, auto-scale for you, and only charge you when they run. This lowers the total cost of maintaining your apps, enabling you to build more logic, faster. The Framework uses new event-driven compute services, like AWS Lambda, Google CloudFunctions, and more; Apache Spark: Fast and general engine for large-scale data processing. Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

Serverless can be classified as a tool in the "Serverless / Task Processing" category, while Apache Spark is grouped under "Big Data Tools".

"API integration " is the primary reason why developers consider Serverless over the competitors, whereas "Open-source" was stated as the key factor in picking Apache Spark.

Serverless and Apache Spark are both open source tools. Serverless with 30.5K GitHub stars and 3.38K forks on GitHub appears to be more popular than Apache Spark with 22.3K GitHub stars and 19.3K GitHub forks.

Slack, Shopify, and SendGrid are some of the popular companies that use Apache Spark, whereas Serverless is used by Droplr, Plista GmbH, and Hammerhead. Apache Spark has a broader approval, being mentioned in 263 company stacks & 111 developers stacks; compared to Serverless, which is listed in 112 company stacks and 43 developer stacks.

What is Serverless?

Build applications comprised of microservices that run in response to events, auto-scale for you, and only charge you when they run. This lowers the total cost of maintaining your apps, enabling you to build more logic, faster. The Framework uses new event-driven compute services, like AWS Lambda, Google CloudFunctions, and more.

What is Apache Spark?

Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.
Get Advice Icon

Need advice about which tool to choose?Ask the StackShare community!

Why do developers choose Serverless?
Why do developers choose Apache Spark?

Sign up to add, upvote and see more prosMake informed product decisions

    Be the first to leave a con
    What companies use Serverless?
    What companies use Apache Spark?

    Sign up to get full access to all the companiesMake informed product decisions

    What tools integrate with Serverless?
    What tools integrate with Apache Spark?

    Sign up to get full access to all the tool integrationsMake informed product decisions

    What are some alternatives to Serverless and Apache Spark?
    AWS Lambda
    AWS Lambda is a compute service that runs your code in response to events and automatically manages the underlying compute resources for you. You can use AWS Lambda to extend other AWS services with custom logic, or create your own back-end services that operate at AWS scale, performance, and security.
    Terraform
    With Terraform, you describe your complete infrastructure as code, even as it spans multiple service providers. Your servers may come from AWS, your DNS may come from CloudFlare, and your database may come from Heroku. Terraform will build all these resources across all these providers in parallel.
    Zappa
    Zappa makes it super easy to deploy all Python WSGI applications on AWS Lambda + API Gateway. Think of it as "serverless" web hosting for your Python web apps. That means infinite scaling, zero downtime, zero maintenance - and at a fraction of the cost of your current deployments!
    Cloud Functions for Firebase
    Cloud Functions for Firebase lets you create functions that are triggered by Firebase products, such as changes to data in the Realtime Database, uploads to Cloud Storage, new user sign ups via Authentication, and conversion events in Analytics.
    Google Cloud Functions
    Construct applications from bite-sized business logic billed to the nearest 100 milliseconds, only while your code is running
    See all alternatives
    Decisions about Serverless and Apache Spark
    StackShare Editors
    StackShare Editors
    Presto
    Presto
    Apache Spark
    Apache Spark
    Hadoop
    Hadoop

    Around 2015, the growing use of Uber’s data exposed limitations in the ETL and Vertica-centric setup, not to mention the increasing costs. “As our company grew, scaling our data warehouse became increasingly expensive. To cut down on costs, we started deleting older, obsolete data to free up space for new data.”

    To overcome these challenges, Uber rebuilt their big data platform around Hadoop. “More specifically, we introduced a Hadoop data lake where all raw data was ingested from different online data stores only once and with no transformation during ingestion.”

    “In order for users to access data in Hadoop, we introduced Presto to enable interactive ad hoc user queries, Apache Spark to facilitate programmatic access to raw data (in both SQL and non-SQL formats), and Apache Hive to serve as the workhorse for extremely large queries.

    See more
    StackShare Editors
    StackShare Editors
    Presto
    Presto
    Apache Spark
    Apache Spark
    Hadoop
    Hadoop

    To improve platform scalability and efficiency, Uber transitioned from JSON to Parquet, and built a central schema service to manage schemas and integrate different client libraries.

    While the first generation big data platform was vulnerable to upstream data format changes, “ad hoc data ingestions jobs were replaced with a standard platform to transfer all source data in its original, nested format into the Hadoop data lake.”

    These platform changes enabled the scaling challenges Uber was facing around that time: “On a daily basis, there were tens of terabytes of new data added to our data lake, and our Big Data platform grew to over 10,000 vcores with over 100,000 running batch jobs on any given day.”

    See more
    StackShare Editors
    StackShare Editors
    Presto
    Presto
    Apache Spark
    Apache Spark
    Scala
    Scala
    MySQL
    MySQL
    Kafka
    Kafka

    Slack’s data team works to “provide an ecosystem to help people in the company quickly and easily answer questions about usage, so they can make better and data informed decisions.” To achieve that goal, that rely on a complex data pipeline.

    An in-house tool call Sqooper scrapes MySQL backups and pipe them to S3. Job queue and log data is sent to Kafka then persisted to S3 using an open source tool called Secor, which was created by Pinterest.

    For compute, Amazon’s Elastic MapReduce (EMR) creates clusters preconfigured for Presto, Hive, and Spark.

    Presto is then used for ad-hoc questions, validating data assumptions, exploring smaller datasets, and creating visualizations for some internal tools. Hive is used for larger data sets or longer time series data, and Spark allows teams to write efficient and robust batch and aggregation jobs. Most of the Spark pipeline is written in Scala.

    Thrift binds all of these engines together with a typed schema and structured data.

    Finally, the Hive Metastore serves as the ground truth for all data and its schema.

    See more
    Nitzan Shapira
    Nitzan Shapira
    at Epsagon · | 11 upvotes · 105.5K views
    atEpsagonEpsagon
    AWS Lambda
    AWS Lambda
    GitHub
    GitHub
    Java
    Java
    Go
    Go
    Node.js
    Node.js
    npm
    npm
    Serverless
    Serverless
    Python
    Python

    At Epsagon, we use hundreds of AWS Lambda functions, most of them are written in Python, and the Serverless Framework to pack and deploy them. One of the issues we've encountered is the difficulty to package external libraries into the Lambda environment using the Serverless Framework. This limitation is probably by design since the external code your Lambda needs can be usually included with a package manager.

    In order to overcome this issue, we've developed a tool, which we also published as open-source (see link below), which automatically packs these libraries using a simple npm package and a YAML configuration file. Support for Node.js, Go, and Java will be available soon.

    The GitHub respoitory: https://github.com/epsagon/serverless-package-external

    See more
    Michal Nowak
    Michal Nowak
    Co-founder at Evojam · | 7 upvotes · 61.9K views
    atEvojamEvojam
    Azure Functions
    Azure Functions
    Firebase
    Firebase
    AWS Lambda
    AWS Lambda
    Serverless
    Serverless

    In a couple of recent projects we had an opportunity to try out the new Serverless approach to building web applications. It wasn't necessarily a question if we should use any particular vendor but rather "if" we can consider serverless a viable option for building apps. Obviously our goal was also to get a feel for this technology and gain some hands-on experience.

    We did consider AWS Lambda, Firebase from Google as well as Azure Functions. Eventually we went with AWS Lambdas.

    PROS
    • No servers to manage (obviously!)
    • Limited fixed costs – you pay only for used time
    • Automated scaling and balancing
    • Automatic failover (or, at this level of abstraction, no failover problem at all)
    • Security easier to provide and audit
    • Low overhead at the start (with the certain level of knowledge)
    • Short time to market
    • Easy handover - deployment coupled with code
    • Perfect choice for lean startups with fast-paced iterations
    • Augmentation for the classic cloud, server(full) approach
    CONS
    • Not much know-how and best practices available about structuring the code and projects on the market
    • Not suitable for complex business logic due to the risk of producing highly coupled code
    • Cost difficult to estimate (helpful tools: serverlesscalc.com)
    • Difficulty in migration to other platforms (Vendor lock⚠️)
    • Little engineers with experience in serverless on the job market
    • Steep learning curve for engineers without any cloud experience

    More details are on our blog: https://evojam.com/blog/2018/12/5/should-you-go-serverless-meet-the-benefits-and-flaws-of-new-wave-of-cloud-solutions I hope it helps 🙌 & I'm curious of your experiences.

    See more
    StackShare Editors
    StackShare Editors
    Apache Thrift
    Apache Thrift
    Kotlin
    Kotlin
    Presto
    Presto
    HHVM (HipHop Virtual Machine)
    HHVM (HipHop Virtual Machine)
    gRPC
    gRPC
    Kubernetes
    Kubernetes
    Apache Spark
    Apache Spark
    Airflow
    Airflow
    Terraform
    Terraform
    Hadoop
    Hadoop
    Swift
    Swift
    Hack
    Hack
    Memcached
    Memcached
    Consul
    Consul
    Chef
    Chef
    Prometheus
    Prometheus

    Since the beginning, Cal Henderson has been the CTO of Slack. Earlier this year, he commented on a Quora question summarizing their current stack.

    Apps
    • Web: a mix of JavaScript/ES6 and React.
    • Desktop: And Electron to ship it as a desktop application.
    • Android: a mix of Java and Kotlin.
    • iOS: written in a mix of Objective C and Swift.
    Backend
    • The core application and the API written in PHP/Hack that runs on HHVM.
    • The data is stored in MySQL using Vitess.
    • Caching is done using Memcached and MCRouter.
    • The search service takes help from SolrCloud, with various Java services.
    • The messaging system uses WebSockets with many services in Java and Go.
    • Load balancing is done using HAproxy with Consul for configuration.
    • Most services talk to each other over gRPC,
    • Some Thrift and JSON-over-HTTP
    • Voice and video calling service was built in Elixir.
    Data warehouse
    • Built using open source tools including Presto, Spark, Airflow, Hadoop and Kafka.
    Etc
    See more
    Julien DeFrance
    Julien DeFrance
    Principal Software Engineer at Tophatter · | 2 upvotes · 14K views
    atSmartZipSmartZip
    Amazon SageMaker
    Amazon SageMaker
    Amazon Machine Learning
    Amazon Machine Learning
    AWS Lambda
    AWS Lambda
    Serverless
    Serverless
    #FaaS
    #GCP
    #PaaS

    Which #IaaS / #PaaS to chose? Not all #Cloud providers are created equal. As you start to use one or the other, you'll build around very specific services that don't have their equivalent elsewhere.

    Back in 2014/2015, this decision I made for SmartZip was a no-brainer and #AWS won. AWS has been a leader, and over the years demonstrated their capacity to innovate, and reducing toil. Like no other.

    Year after year, this kept on being confirmed, as they rolled out new (managed) services, got into Serverless with AWS Lambda / FaaS And allowed domains such as #AI / #MachineLearning to be put into the hands of every developers thanks to Amazon Machine Learning or Amazon SageMaker for instance.

    Should you compare with #GCP for instance, it's not quite there yet. Building around these managed services, #AWS allowed me to get my developers on a whole new level. Where they know what's under the hood. Where they know they have these services available and can build around them. Where they care and are responsible for operations and security and deployment of what they've worked on.

    See more
    Aviad Mor
    Aviad Mor
    CTO & Co-Founder at Lumigo · | 5 upvotes · 10.1K views
    atLumigoLumigo
    Serverless
    Serverless
    CircleCI
    CircleCI
    AWS Lambda
    AWS Lambda

    Our backend is serverless based, with many AWS Lambda , with CI/CD, using CircleCI and Serverless. This allows to develop with awesome agility and move fast. Since we update our lambdas daily, we needed a way to make sure we did not run into AWS's max limit of versions per lambda. We use the open source in link below to clear them out and stay clear of the limit.

    See more
    Aliadoc Team
    Aliadoc Team
    at aliadoc.com · | 5 upvotes · 84.9K views
    atAliadocAliadoc
    Bitbucket
    Bitbucket
    Visual Studio Code
    Visual Studio Code
    Serverless
    Serverless
    Google Cloud Storage
    Google Cloud Storage
    Google App Engine
    Google App Engine
    Cloud Functions for Firebase
    Cloud Functions for Firebase
    Firebase
    Firebase
    CloudFlare
    CloudFlare
    Create React App
    Create React App
    React
    React
    #Aliadoc

    In #Aliadoc, we're exploring the crowdfunding option to get traction before launch. We are building a SaaS platform for website design customization.

    For the Admin UI and website editor we use React and we're currently transitioning from a Create React App setup to a custom one because our needs have become more specific. We use CloudFlare as much as possible, it's a great service.

    For routing dynamic resources and proxy tasks to feed websites to the editor we leverage CloudFlare Workers for improved responsiveness. We use Firebase for our hosting needs and user authentication while also using several Cloud Functions for Firebase to interact with other services along with Google App Engine and Google Cloud Storage, but also the Real Time Database is on the radar for collaborative website editing.

    We generally hate configuration but honestly because of the stage of our project we lack resources for doing heavy sysops work. So we are basically just relying on Serverless technologies as much as we can to do all server side processing.

    Visual Studio Code definitively makes programming a much easier and enjoyable task, we just love it. We combine it with Bitbucket for our source code control needs.

    See more
    Eric Colson
    Eric Colson
    Chief Algorithms Officer at Stitch Fix · | 19 upvotes · 266.1K views
    atStitch FixStitch Fix
    Amazon EC2 Container Service
    Amazon EC2 Container Service
    Docker
    Docker
    PyTorch
    PyTorch
    R
    R
    Python
    Python
    Presto
    Presto
    Apache Spark
    Apache Spark
    Amazon S3
    Amazon S3
    PostgreSQL
    PostgreSQL
    Kafka
    Kafka
    #Data
    #DataStack
    #DataScience
    #ML
    #Etl
    #AWS

    The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. We store data in an Amazon S3 based data warehouse. Apache Spark on Yarn is our tool of choice for data movement and #ETL. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.

    Beyond data movement and ETL, most #ML centric jobs (e.g. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see https://github.com/stitchfix/flotilla-os).

    At Stitch Fix, algorithmic integrations are pervasive across the business. We have dozens of data products actively integrated systems. That requires serving layer that is robust, agile, flexible, and allows for self-service. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. This provides our data scientist a one-click method of getting from their algorithms to production. We then integrate those deployments into a service mesh, which allows us to A/B test various implementations in our product.

    For more info:

    #DataScience #DataStack #Data

    See more
    Tim Nolet
    Tim Nolet
    Founder, Engineer & Dishwasher at Checkly · | 5 upvotes · 20.1K views
    atChecklyHQChecklyHQ
    Node.js
    Node.js
    Google Cloud Functions
    Google Cloud Functions
    Azure Functions
    Azure Functions
    Amazon CloudWatch
    Amazon CloudWatch
    Serverless
    Serverless
    AWS Lambda
    AWS Lambda

    AWS Lambda Serverless Amazon CloudWatch Azure Functions Google Cloud Functions Node.js

    In the last year or so, I moved all Checkly monitoring workloads to AWS Lambda. Here are some stats:

    • We run three core functions in all AWS regions. They handle API checks, browser checks and setup / teardown scripts. Check our docs to find out what that means.
    • All functions are hooked up to SNS topics but can also be triggered directly through AWS SDK calls.
    • The busiest function is a plumbing function that forwards data to our database. It is invoked anywhere between 7000 and 10.000 times per hour with an average duration of about 179 ms.
    • We run separate dev and test versions of each function in each region.

    Moving all this to AWS Lambda took some work and considerations. The blog post linked below goes into the following topics:

    • Why Lambda is an almost perfect match for SaaS. Especially when you're small.
    • Why I don't use a "big" framework around it.
    • Why distributed background jobs triggered by queues are Lambda's raison d'être.
    • Why monitoring & logging is still an issue.

    https://blog.checklyhq.com/how-i-made-aws-lambda-work-for-my-saas/

    See more
    Praveen Mooli
    Praveen Mooli
    Technical Leader at Taylor and Francis · | 11 upvotes · 157.3K views
    MongoDB Atlas
    MongoDB Atlas
    Amazon S3
    Amazon S3
    Amazon DynamoDB
    Amazon DynamoDB
    Amazon RDS
    Amazon RDS
    Serverless
    Serverless
    Docker
    Docker
    Terraform
    Terraform
    Travis CI
    Travis CI
    GitHub
    GitHub
    RxJS
    RxJS
    Angular 2
    Angular 2
    AWS Lambda
    AWS Lambda
    Amazon SQS
    Amazon SQS
    Amazon SNS
    Amazon SNS
    Amazon Kinesis Firehose
    Amazon Kinesis Firehose
    Amazon Kinesis
    Amazon Kinesis
    Flask
    Flask
    Python
    Python
    ExpressJS
    ExpressJS
    Node.js
    Node.js
    Spring Boot
    Spring Boot
    Java
    Java
    #Data
    #Devops
    #Webapps
    #Eventsourcingframework
    #Microservices
    #Backend

    We are in the process of building a modern content platform to deliver our content through various channels. We decided to go with Microservices architecture as we wanted scale. Microservice architecture style is an approach to developing an application as a suite of small independently deployable services built around specific business capabilities. You can gain modularity, extensive parallelism and cost-effective scaling by deploying services across many distributed servers. Microservices modularity facilitates independent updates/deployments, and helps to avoid single point of failure, which can help prevent large-scale outages. We also decided to use Event Driven Architecture pattern which is a popular distributed asynchronous architecture pattern used to produce highly scalable applications. The event-driven architecture is made up of highly decoupled, single-purpose event processing components that asynchronously receive and process events.

    To build our #Backend capabilities we decided to use the following: 1. #Microservices - Java with Spring Boot , Node.js with ExpressJS and Python with Flask 2. #Eventsourcingframework - Amazon Kinesis , Amazon Kinesis Firehose , Amazon SNS , Amazon SQS, AWS Lambda 3. #Data - Amazon RDS , Amazon DynamoDB , Amazon S3 , MongoDB Atlas

    To build #Webapps we decided to use Angular 2 with RxJS

    #Devops - GitHub , Travis CI , Terraform , Docker , Serverless

    See more
    Interest over time
    Reviews of Serverless and Apache Spark
    No reviews found
    How developers use Serverless and Apache Spark
    Avatar of Wei Chen
    Wei Chen uses Apache SparkApache Spark

    Spark is good at parallel data processing management. We wrote a neat program to handle the TBs data we get everyday.

    Avatar of betterPT
    betterPT uses ServerlessServerless

    We use AWS Lambda / Serverless as a Facade for out integrations with EMRs.

    Avatar of Ralic Lo
    Ralic Lo uses Apache SparkApache Spark

    Used Spark Dataframe API on Spark-R for big data analysis.

    Avatar of Kalibrr
    Kalibrr uses Apache SparkApache Spark

    We use Apache Spark in computing our recommendations.

    Avatar of BrainFinance
    BrainFinance uses Apache SparkApache Spark

    As a part of big data machine learning stack (SMACK).

    Avatar of Dotmetrics
    Dotmetrics uses Apache SparkApache Spark

    Big data analytics and nightly transformation jobs.

    Avatar of JimmyCode
    JimmyCode uses ServerlessServerless

    Oh yeah! We run on lambdas.

    How much does Serverless cost?
    How much does Apache Spark cost?
    Pricing unavailable
    Pricing unavailable
    News about Serverless
    More news
    News about Apache Spark
    More news