129
174
+ 1
35

What is Snowplow?

Snowplow is a real-time event data pipeline that lets you track, contextualize, validate and model your customers’ behaviour across your entire digital estate.
Snowplow is a tool in the Custom Analytics category of a tech stack.
Snowplow is an open source tool with 6.9K GitHub stars and 1.2K GitHub forks. Here’s a link to Snowplow's open source repository on GitHub

Who uses Snowplow?

Companies
45 companies reportedly use Snowplow in their tech stacks, including N26, Pier, and Data.

Developers
80 developers on StackShare have stated that they use Snowplow.

Snowplow Integrations

PostgreSQL, Amazon S3, Elasticsearch, Kafka, and Microsoft Azure are some of the popular tools that integrate with Snowplow. Here's a list of all 21 tools that integrate with Snowplow.
Pros of Snowplow
7
Can track any type of digital event
5
First-party tracking
5
Data quality
4
Real-time streams
4
Completely open source
4
Redshift integration
3
Snowflake integration
3
BigQuery integration
Decisions about Snowplow

Here are some stack decisions, common use cases and reviews by companies and developers who chose Snowplow in their tech stack.

Needs advice
on
Amazon S3Amazon S3DremioDremio
and
SnowflakeSnowflake

Trying to establish a data lake(or maybe puddle) for my org's Data Sharing project. The idea is that outside partners would send cuts of their PHI data, regardless of format/variables/systems, to our Data Team who would then harmonize the data, create data marts, and eventually use it for something. End-to-end, I'm envisioning:

  1. Ingestion->Secure, role-based, self service portal for users to upload data (1a. bonus points if it can preform basic validations/masking)
  2. Storage->Amazon S3 seems like the cheapest. We probably won't need very big, even at full capacity. Our current storage is a secure Box folder that has ~4GB with several batches of test data, code, presentations, and planning docs.
  3. Data Catalog-> AWS Glue? Azure Data Factory? Snowplow? is the main difference basically based on the vendor? We also will have Data Dictionaries/Codebooks from submitters. Where would they fit in?
  4. Partitions-> I've seen Cassandra and YARN mentioned, but have no experience with either
  5. Processing-> We want to use SAS if at all possible. What will work with SAS code?
  6. Pipeline/Automation->The check-in and verification processes that have been outlined are rather involved. Some sort of automated messaging or approval workflow would be nice
  7. I have very little guidance on what a "Data Mart" should look like, so I'm going with the idea that it would be another "experimental" partition. Unless there's an actual mart-building paradigm I've missed?
  8. An end user might use the catalog to pull certain de-identified data sets from the marts. Again, role-based access and self-service gui would be preferable. I'm the only full-time tech person on this project, but I'm mostly an OOP, HTML, JavaScript, and some SQL programmer. Most of this is out of my repertoire. I've done a lot of research, but I can't be an effective evangelist without hands-on experience. Since we're starting a new year of our grant, they've finally decided to let me try some stuff out. Any pointers would be appreciated!
See more

Snowplow's Features

  • Track rich events from your websites, mobile apps, server-side systems, third party systems and any type of connected device, so that you have a record of what happened, when, and to whom
  • Load your data into your data warehouse of choice to power sophisticated analytics
  • Process your data including validating, enriching and modeling it
  • Your data is available in real-time via Amazon Kinesis, Google Pub/Sub and BigQuery to power real-time applications and reports
  • Your data pipeline is running in your cloud environment giving you full ownership and control of your data

Snowplow Alternatives & Comparisons

What are some alternatives to Snowplow?
Google Analytics
Google Analytics lets you measure your advertising ROI as well as track your Flash, video, and social networking sites and applications.
Segment
Segment is a single hub for customer data. Collect your data in one place, then send it to more than 100 third-party tools, internal systems, or Amazon Redshift with the flip of a switch.
Mixpanel
Mixpanel helps companies build better products through data. With our powerful, self-serve product analytics solution, teams can easily analyze how and why people engage, convert, and retain to improve their user experience.
Piwik
Matomo (formerly Piwik) is a full-featured PHP MySQL software program that you download and install on your own webserver. At the end of the five-minute installation process, you will be given a JavaScript code.
Heap
Heap automatically captures every user action in your app and lets you measure it all. Clicks, taps, swipes, form submissions, page views, and more. Track events and segment users instantly. No pushing code. No waiting for data to trickle in.
See all alternatives

Snowplow's Followers
174 developers follow Snowplow to keep up with related blogs and decisions.