What is Snowplow?
Snowplow is a real-time event data pipeline that lets you track, contextualize, validate and model your customers’ behaviour across your entire digital estate.
Snowplow is a tool in the Custom Analytics category of a tech stack.
Snowplow is an open source tool with 6.5K GitHub stars and 1.2K GitHub forks. Here’s a link to Snowplow's open source repository on GitHub
Who uses Snowplow?
44 companies reportedly use Snowplow in their tech stacks, including N26, Pier, and Data.
77 developers on StackShare have stated that they use Snowplow.
PostgreSQL, Amazon S3, Elasticsearch, Kafka, and Microsoft Azure are some of the popular tools that integrate with Snowplow. Here's a list of all 21 tools that integrate with Snowplow.
Pros of Snowplow
Can track any type of digital event
Completely open source
Decisions about Snowplow
Here are some stack decisions, common use cases and reviews by companies and developers who chose Snowplow in their tech stack.
Trying to establish a data lake(or maybe puddle) for my org's Data Sharing project. The idea is that outside partners would send cuts of their PHI data, regardless of format/variables/systems, to our Data Team who would then harmonize the data, create data marts, and eventually use it for something. End-to-end, I'm envisioning:
- Ingestion->Secure, role-based, self service portal for users to upload data (1a. bonus points if it can preform basic validations/masking)
- Storage->Amazon S3 seems like the cheapest. We probably won't need very big, even at full capacity. Our current storage is a secure Box folder that has ~4GB with several batches of test data, code, presentations, and planning docs.
- Data Catalog-> AWS Glue? Azure Data Factory? Snowplow? is the main difference basically based on the vendor? We also will have Data Dictionaries/Codebooks from submitters. Where would they fit in?
- Partitions-> I've seen Cassandra and YARN mentioned, but have no experience with either
- Processing-> We want to use SAS if at all possible. What will work with SAS code?
- Pipeline/Automation->The check-in and verification processes that have been outlined are rather involved. Some sort of automated messaging or approval workflow would be nice
- I have very little guidance on what a "Data Mart" should look like, so I'm going with the idea that it would be another "experimental" partition. Unless there's an actual mart-building paradigm I've missed?
- Track rich events from your websites, mobile apps, server-side systems, third party systems and any type of connected device, so that you have a record of what happened, when, and to whom
- Load your data into your data warehouse of choice to power sophisticated analytics
- Process your data including validating, enriching and modeling it
- Your data is available in real-time via Amazon Kinesis, Google Pub/Sub and BigQuery to power real-time applications and reports
- Your data pipeline is running in your cloud environment giving you full ownership and control of your data
Snowplow Alternatives & Comparisons
What are some alternatives to Snowplow?
See all alternatives
Google Analytics lets you measure your advertising ROI as well as track your Flash, video, and social networking sites and applications.
Segment is a single hub for customer data. Collect your data in one place, then send it to more than 100 third-party tools, internal systems, or Amazon Redshift with the flip of a switch.
Mixpanel helps companies build better products through data. With our powerful, self-serve product analytics solution, teams can easily analyze how and why people engage, convert, and retain to improve their user experience.
Heap automatically captures every user action in your app and lets you measure it all. Clicks, taps, swipes, form submissions, page views, and more. Track events and segment users instantly. No pushing code. No waiting for data to trickle in.