Feed powered byStream Blue Logo Copy 5Created with Sketch.
Avatar of Tim Specht

Tim Specht

‎Co-Founder and CTO at Dubsmash

Decision at Dubsmash about Memcached, Algolia, Elasticsearch, SearchAsAService

Avatar of tspecht
‎Co-Founder and CTO at Dubsmash ·
MemcachedMemcached
AlgoliaAlgolia
ElasticsearchElasticsearch
#SearchAsAService

Although we were using Elasticsearch in the beginning to power our in-app search, we moved this part of our processing over to Algolia a couple of months ago; this has proven to be a fantastic choice, letting us build search-related features with more confidence and speed.

Elasticsearch is only used for searching in internal tooling nowadays; hosting and running it reliably has been a task that took up too much time for us in the past and fine-tuning the results to reach a great user-experience was also never an easy task for us. With Algolia we can flexibly change ranking methods on the fly and can instead focus our time on fine-tuning the experience within our app.

Memcached is used in front of most of the API endpoints to cache responses in order to speed up response times and reduce server-costs on our side.

#SearchAsAService

16 upvotes·287 views

Decision at Dubsmash about Kubernetes, Amazon EC2, Heroku, Python, ContainerTools, PlatformAsAService

Avatar of tspecht
‎Co-Founder and CTO at Dubsmash ·
KubernetesKubernetes
Amazon EC2Amazon EC2
HerokuHeroku
PythonPython
#ContainerTools
#PlatformAsAService

Since we deployed our very first lines of Python code more than 2 years ago we are happy users of Heroku. It lets us focus on building features rather than maintaining infrastructure, has super-easy scaling capabilities, and the support team is always happy to help (in the rare case you need them).

We played with the thought of moving our computational needs over to barebone Amazon EC2 instances or a container-management solution like Kubernetes a couple of times, but the added costs of maintaining this architecture and the ease-of-use of Heroku have kept us from moving forward so far.

Running independent services for different needs of our features gives us the flexibility to choose whatever data storage is best for the given task.

#PlatformAsAService #ContainerTools

14 upvotes·636 views

Decision at Dubsmash about Google BigQuery, Amazon SQS, AWS Lambda, Amazon Kinesis, Google Analytics, GeneralAnalytics, BigDataAsAService, RealTimeDataProcessing, ServerlessTaskProcessing

Avatar of tspecht
‎Co-Founder and CTO at Dubsmash ·
Google BigQueryGoogle BigQuery
Amazon SQSAmazon SQS
AWS LambdaAWS Lambda
Amazon KinesisAmazon Kinesis
Google AnalyticsGoogle Analytics
#GeneralAnalytics
#BigDataAsAService
#RealTimeDataProcessing
#ServerlessTaskProcessing

In order to accurately measure & track user behaviour on our platform we moved over quickly from the initial solution using Google Analytics to a custom-built one due to resource & pricing concerns we had.

While this does sound complicated, it’s as easy as clients sending JSON blobs of events to Amazon Kinesis from where we use AWS Lambda & Amazon SQS to batch and process incoming events and then ingest them into Google BigQuery. Once events are stored in BigQuery (which usually only takes a second from the time the client sends the data until it’s available), we can use almost-standard-SQL to simply query for data while Google makes sure that, even with terabytes of data being scanned, query times stay in the range of seconds rather than hours. Before ingesting their data into the pipeline, our mobile clients are aggregating events internally and, once a certain threshold is reached or the app is going to the background, sending the events as a JSON blob into the stream.

In the past we had workers running that continuously read from the stream and would validate and post-process the data and then enqueue them for other workers to write them to BigQuery. We went ahead and implemented the Lambda-based approach in such a way that Lambda functions would automatically be triggered for incoming records, pre-aggregate events, and write them back to SQS, from which we then read them, and persist the events to BigQuery. While this approach had a couple of bumps on the road, like re-triggering functions asynchronously to keep up with the stream and proper batch sizes, we finally managed to get it running in a reliable way and are very happy with this solution today.

#ServerlessTaskProcessing #GeneralAnalytics #RealTimeDataProcessing #BigDataAsAService

14 upvotes·381 views

Decision at Dubsmash about Amazon RDS for Aurora, Redis, Amazon DynamoDB, Amazon RDS, Heroku, PostgreSQL, Databases, PlatformAsAService, NosqlDatabaseAsAService, SqlDatabaseAsAService

Avatar of tspecht
‎Co-Founder and CTO at Dubsmash ·
Amazon RDS for AuroraAmazon RDS for Aurora
RedisRedis
Amazon DynamoDBAmazon DynamoDB
Amazon RDSAmazon RDS
HerokuHeroku
PostgreSQLPostgreSQL
#Databases
#PlatformAsAService
#NosqlDatabaseAsAService
#SqlDatabaseAsAService

Over the years we have added a wide variety of different storages to our stack including PostgreSQL (some hosted by Heroku, some by Amazon RDS) for storing relational data, Amazon DynamoDB to store non-relational data like recommendations & user connections, or Redis to hold pre-aggregated data to speed up API endpoints.

Since we started running Postgres ourselves on RDS instead of only using the managed offerings of Heroku, we've gained additional flexibility in scaling our application while reducing costs at the same time.

We are also heavily testing Amazon RDS for Aurora in its Postgres-compatible version and will also give the new release of Aurora Serverless a try!

#SqlDatabaseAsAService #NosqlDatabaseAsAService #Databases #PlatformAsAService

13 upvotes·412 views

Decision at Dubsmash about Amazon CloudFront, Amazon S3, CloudStorage, ContentDeliveryNetwork, AssetsAndMedia

Avatar of tspecht
‎Co-Founder and CTO at Dubsmash ·
Amazon CloudFrontAmazon CloudFront
Amazon S3Amazon S3
#CloudStorage
#ContentDeliveryNetwork
#AssetsAndMedia

In the early days features like My Dubs, which enable users to upload their Dubs onto our platform, uploads were going directly against our API, which then stored the files in Amazon S3.

We quickly saw that this approach was crumbling our API performance big time. Since users usually have slower internet connections on their phones, the process of uploading the file took up a huge percentage of the processing time on our end, forcing us to spin up way more machines than we actually needed. We since have moved to a multi-way handshake-like upload process that uses signed URLs vendored to the clients upon request so they can upload the files directly to S3. These files are then distributed, cached, and served back to other clients through Amazon CloudFront.

#AssetsAndMedia #ContentDeliveryNetwork #CloudStorage

13 upvotes·201 views

Decision at Dubsmash about AWS Lambda, ApplicationHosting

Avatar of tspecht
‎Co-Founder and CTO at Dubsmash ·
AWS LambdaAWS Lambda
#ApplicationHosting

Whenever we need to notify a user of something happening on our platform, whether it’s a personal push notification from one user to another, a new Dub, or a notification going out to millions of users at the same time that new content is available, we rely on AWS Lambda to do this task for us. When we started implementing this feature 2 years ago we were luckily able to get early access to the Lambda Beta and are still happy with the way things are running on there, especially given all the easy to set up integrations with other AWS services.

Lambda enables us to quickly send out million of pushes within a couple of minutes by acting as a multiplexer in front of SNS. We simply call a first Lambda function with a batch of up to 300 push notifications to be sent, which then calls a subsequent Lambda function with 20 pushes each, which then does the call to SNS to actually send out the push notifications.

This multi-tier process of sending push notifications enables us to quickly adjust our sending volume while keeping costs & maintenance overhead, on our side, to a bare minimum.

#ApplicationHosting

13 upvotes·183 views

Decision at Dubsmash about Docker Compose, Docker, ContainerTools

Avatar of tspecht
‎Co-Founder and CTO at Dubsmash ·
Docker ComposeDocker Compose
DockerDocker
#ContainerTools

On the backend side we started using Docker almost 2 years ago. Looking back, this was absolutely the right decision, as running things manually with so many services and so few engineers wouldn’t have been possible at all.

While in the beginning we used it mostly to ease-up local development, we have since started using it quickly to also run all of our CI & CD pipeline on top of it. This not only enabled us to speed things up drastically locally by using Docker Compose to spin up different services & dependencies and making sure they can talk to each other, but also made sure that we had reliable builds on our build infrastructure and could easily debug problems using the baked images in case anything should go wrong. Using Docker was a slight change in the beginning but we ultimately found that it forces you to think through how your services are composed and structured and thus improves the way you structure your systems.

#ContainerTools

12 upvotes·462 views

Decision at Dubsmash about Pushwoosh, Google Analytics, WebPushNotifications, Analytics, Communications, GeneralAnalytics

Avatar of tspecht
‎Co-Founder and CTO at Dubsmash ·
PushwooshPushwoosh
Google AnalyticsGoogle Analytics
#WebPushNotifications
#Analytics
#Communications
#GeneralAnalytics

We used Google Analytics to track user and market growth and Pushwoosh to send out push notifications by hand to promote new content. Even though we didn’t localize our pushes at all, we added custom tags to devices when registering with the service so we could easily target certain markets (e.g. send a push to German users only), which was totally sufficient at the time. #WebPushNotifications #Analytics #GeneralAnalytics #Communications

12 upvotes·294 views

Decision at Dubsmash about Amazon S3, DataStores, CloudStorage

Avatar of tspecht
‎Co-Founder and CTO at Dubsmash ·
Amazon S3Amazon S3
#DataStores
#CloudStorage

Dubsmash in the beginning was simply downloading a JSON file from Amazon S3 containing the Quote metadata. This file was updated & uploaded to Amazon S3 by hand every time we had new content available; we would simply put in the URL to the sound file, the name of the Quote, and re-upload the file.

We chose this really simple mechanism to avoid having to bootstrap a custom API to distribute the content to the clients. This turned out to be a great business decision as well, since we didn’t need to worry at all about any scaling issues in the beginning; this became an even better call a couple weeks after the initial launch.

#CloudStorage #DataStores

11 upvotes·230 views

Decision at Dubsmash about Stream, ActivityFeedsAsAService

Avatar of tspecht
‎Co-Founder and CTO at Dubsmash ·
StreamStream
#ActivityFeedsAsAService

Dubsmash's very small engineering team has always made a point to spend its resources on solving product questions rather than managing & running underlying infrastructure.

We recently started using Stream for building activity feeds in various forms and shapes. Using Stream we are able to rapidly iterate on features like newsfeeds, trending feeds and more while making sure everything runs smooth and snappy in the background. With their advanced ranking algorithms and their recent transition from Python to Go, we are able to change our feeds ranking on the fly and gauge user impact immediately!

#ActivityFeedsAsAService

10 upvotes·129 views