Why developers like Snowplow

What is Snowplow?

Snowplow is a real-time event data pipeline that lets you track, contextualize, validate and model your customers’ behaviour across your entire digital estate.

Snowplow is a tool in the Custom Analytics category of a tech stack.

Snowplow is an open source tool with 6.9K GitHub stars and 1.2K GitHub forks. Here’s a link to Snowplow's open source repository on GitHub

Who uses Snowplow?

Companies

45 companies reportedly use Snowplow in their tech stacks, including N26, Pier, and Data.

N26

Pier

Data

Mathspace

www.autotrader.co ...

Doubtnut

Kaia Health

CarGurus

JustWatch

Developers

80 developers on StackShare have stated that they use Snowplow.

ClientSuccess

My Stack

RootsRated

TF9010

My Stack

Snowplow Integrations

PostgreSQL, Amazon S3, Elasticsearch, Kafka, and Microsoft Azure are some of the popular tools that integrate with Snowplow. Here's a list of all 21 tools that integrate with Snowplow.

PostgreSQL

Amazon S3

Elasticsearch

Kafka

Microsoft Azure

Apache Spark

Hadoop

Google Cloud Storage

Google BigQuery

Pros of Snowplow

Can track any type of digital event

First-party tracking

Data quality

Real-time streams

Completely open source

Redshift integration

Snowflake integration

BigQuery integration

Decisions about Snowplow

Here are some stack decisions, common use cases and reviews by companies and developers who chose Snowplow in their tech stack.

kew44

Nov 10, 2022 | 6 upvotes · 120.1K views

Needs advice

Amazon S3

Dremio

and

Snowflake

Trying to establish a data lake(or maybe puddle) for my org's Data Sharing project. The idea is that outside partners would send cuts of their PHI data, regardless of format/variables/systems, to our Data Team who would then harmonize the data, create data marts, and eventually use it for something. End-to-end, I'm envisioning:

Ingestion->Secure, role-based, self service portal for users to upload data (1a. bonus points if it can preform basic validations/masking)
Storage->Amazon S3 seems like the cheapest. We probably won't need very big, even at full capacity. Our current storage is a secure Box folder that has ~4GB with several batches of test data, code, presentations, and planning docs.
Data Catalog-> AWS Glue? Azure Data Factory? Snowplow? is the main difference basically based on the vendor? We also will have Data Dictionaries/Codebooks from submitters. Where would they fit in?
Partitions-> I've seen Cassandra and YARN mentioned, but have no experience with either
Processing-> We want to use SAS if at all possible. What will work with SAS code?
Pipeline/Automation->The check-in and verification processes that have been outlined are rather involved. Some sort of automated messaging or approval workflow would be nice
I have very little guidance on what a "Data Mart" should look like, so I'm going with the idea that it would be another "experimental" partition. Unless there's an actual mart-building paradigm I've missed?
An end user might use the catalog to pull certain de-identified data sets from the marts. Again, role-based access and self-service gui would be preferable. I'm the only full-time tech person on this project, but I'm mostly an OOP, HTML, JavaScript, and some SQL programmer. Most of this is out of my repertoire. I've done a lot of research, but I can't be an effective evangelist without hands-on experience. Since we're starting a new year of our grant, they've finally decided to let me try some stuff out. Any pointers would be appreciated!

See all decisions

Snowplow's Features

Track rich events from your websites, mobile apps, server-side systems, third party systems and any type of connected device, so that you have a record of what happened, when, and to whom
Load your data into your data warehouse of choice to power sophisticated analytics
Process your data including validating, enriching and modeling it
Your data is available in real-time via Amazon Kinesis, Google Pub/Sub and BigQuery to power real-time applications and reports
Your data pipeline is running in your cloud environment giving you full ownership and control of your data

Snowplow Alternatives & Comparisons

What are some alternatives to Snowplow?

Google Analytics

Google Analytics lets you measure your advertising ROI as well as track your Flash, video, and social networking sites and applications.

Segment

Segment is a single hub for customer data. Collect your data in one place, then send it to more than 100 third-party tools, internal systems, or Amazon Redshift with the flip of a switch.

Mixpanel

Mixpanel helps companies build better products through data. With our powerful, self-serve product analytics solution, teams can easily analyze how and why people engage, convert, and retain to improve their user experience.

Piwik

Matomo (formerly Piwik) is a full-featured PHP MySQL software program that you download and install on your own webserver. At the end of the five-minute installation process, you will be given a JavaScript code.

Heap

Heap automatically captures every user action in your app and lets you measure it all. Clicks, taps, swipes, form submissions, page views, and more. Track events and segment users instantly. No pushing code. No waiting for data to trickle in.

See all alternatives

Related Comparisons