Feed powered byStream Blue Logo Copy 5Created with Sketch.
Amazon S3

Amazon S3

Application and Data / Data Stores / Cloud Storage

Decision at Raygun about AWS Elastic Load Balancing (ELB), Amazon EC2, nginx, Amazon RDS, Amazon S3, WebServers, LoadBalancerReverseProxy, CloudHosting, CloudStorage

Avatar of CmdrKeen
Co-founder & CEO at Raygun ·
AWS Elastic Load Balancing (ELB)AWS Elastic Load Balancing (ELB)
Amazon EC2Amazon EC2
nginxnginx
Amazon RDSAmazon RDS
Amazon S3Amazon S3
#WebServers
#LoadBalancerReverseProxy
#CloudHosting
#CloudStorage

We chose AWS because, at the time, it was really the only cloud provider to choose from.

We tend to use their basic building blocks (EC2, ELB, Amazon S3, Amazon RDS) rather than vendor specific components like databases and queuing. We deliberately decided to do this to ensure we could provide multi-cloud support or potentially move to another cloud provider if the offering was better for our customers.

We’ve utilized c3.large nodes for both the Node.js deployment and then for the .NET Core deployment. Both sit as backends behind an nginx instance and are managed using scaling groups in Amazon EC2 sitting behind a standard AWS Elastic Load Balancing (ELB).

While we’re satisfied with AWS, we do review our decision each year and have looked at Azure and Google Cloud offerings.

#CloudHosting #WebServers #CloudStorage #LoadBalancerReverseProxy

19 upvotes·418 views

Decision at Uploadcare about PostgreSQL, Amazon DynamoDB, Amazon S3, Redis, Python, Google App Engine

Avatar of dmitry-mukhin
PostgreSQLPostgreSQL
Amazon DynamoDBAmazon DynamoDB
Amazon S3Amazon S3
RedisRedis
PythonPython
Google App EngineGoogle App Engine

Uploadcare has built an infinitely scalable infrastructure by leveraging AWS. Building on top of AWS allows us to process 350M daily requests for file uploads, manipulations, and deliveries. When we started in 2011 the only cloud alternative to AWS was Google App Engine which was a no-go for a rather complex solution we wanted to build. We also didn’t want to buy any hardware or use co-locations.

Our stack handles receiving files, communicating with external file sources, managing file storage, managing user and file data, processing files, file caching and delivery, and managing user interface dashboards.

At its core, Uploadcare runs on Python. The Europython 2011 conference in Florence really inspired us, coupled with the fact that it was general enough to solve all of our challenges informed this decision. Additionally we had prior experience working in Python.

We chose to build the main application with Django because of its feature completeness and large footprint within the Python ecosystem.

All the communications within our ecosystem occur via several HTTP APIs, Redis, Amazon S3, and Amazon DynamoDB. We decided on this architecture so that our our system could be scalable in terms of storage and database throughput. This way we only need Django running on top of our database cluster. We use PostgreSQL as our database because it is considered an industry standard when it comes to clustering and scaling.

15 upvotes·634 views

Decision at Stitch about Go, Clojure, JavaScript, Python, Kubernetes, AWS OpsWorks, Amazon EC2, Amazon Redshift, Amazon S3, Amazon RDS

Avatar of jakestein
CEO at Stitch ·
GoGo
ClojureClojure
JavaScriptJavaScript
PythonPython
KubernetesKubernetes
AWS OpsWorksAWS OpsWorks
Amazon EC2Amazon EC2
Amazon RedshiftAmazon Redshift
Amazon S3Amazon S3
Amazon RDSAmazon RDS

Stitch is run entirely on AWS. All of our transactional databases are run with Amazon RDS, and we rely on Amazon S3 for data persistence in various stages of our pipeline. Our product integrates with Amazon Redshift as a data destination, and we also use Redshift as an internal data warehouse (powered by Stitch, of course).

The majority of our services run on stateless Amazon EC2 instances that are managed by AWS OpsWorks. We recently introduced Kubernetes into our infrastructure to run the scheduled jobs that execute Singer code to extract data from various sources. Although we tend to be wary of shiny new toys, Kubernetes has proven to be a good fit for this problem, and its stability, strong community and helpful tooling have made it easy for us to incorporate into our operations.

While we continue to be happy with Clojure for our internal services, we felt that its relatively narrow adoption could impede Singer's growth. We chose Python both because it is well suited to the task, and it seems to have reached critical mass among data engineers. All that being said, the Singer spec is language agnostic, and integrations and libraries have been developed in JavaScript, Go, and Clojure.

13 upvotes·931 views

Decision at Dubsmash about Amazon CloudFront, Amazon S3, CloudStorage, ContentDeliveryNetwork, AssetsAndMedia

Avatar of tspecht
‎Co-Founder and CTO at Dubsmash ·
Amazon CloudFrontAmazon CloudFront
Amazon S3Amazon S3
#CloudStorage
#ContentDeliveryNetwork
#AssetsAndMedia

In the early days features like My Dubs, which enable users to upload their Dubs onto our platform, uploads were going directly against our API, which then stored the files in Amazon S3.

We quickly saw that this approach was crumbling our API performance big time. Since users usually have slower internet connections on their phones, the process of uploading the file took up a huge percentage of the processing time on our end, forcing us to spin up way more machines than we actually needed. We since have moved to a multi-way handshake-like upload process that uses signed URLs vendored to the clients upon request so they can upload the files directly to S3. These files are then distributed, cached, and served back to other clients through Amazon CloudFront.

#AssetsAndMedia #ContentDeliveryNetwork #CloudStorage

13 upvotes·201 views

Decision at Dubsmash about Amazon S3, DataStores, CloudStorage

Avatar of tspecht
‎Co-Founder and CTO at Dubsmash ·
Amazon S3Amazon S3
#DataStores
#CloudStorage

Dubsmash in the beginning was simply downloading a JSON file from Amazon S3 containing the Quote metadata. This file was updated & uploaded to Amazon S3 by hand every time we had new content available; we would simply put in the URL to the sound file, the name of the Quote, and re-upload the file.

We chose this really simple mechanism to avoid having to bootstrap a custom API to distribute the content to the clients. This turned out to be a great business decision as well, since we didn’t need to worry at all about any scaling issues in the beginning; this became an even better call a couple weeks after the initial launch.

#CloudStorage #DataStores

11 upvotes·230 views

Decision at Stitch Fix about Apache Spark, Victory, Amazon S3, Elasticsearch, Redux.js, React

Avatar of psunnn
Software Engineer at Stitch Fix ·
Apache SparkApache Spark
VictoryVictory
Amazon S3Amazon S3
ElasticsearchElasticsearch
Redux.jsRedux.js
ReactReact

As a frontend engineer on the Algorithms & Analytics team at Stitch Fix, I work with data scientists to develop applications and visualizations to help our internal business partners make data-driven decisions. I envisioned a platform that would assist data scientists in the data exploration process, allowing them to visually explore and rapidly iterate through their assumptions, then share their insights with others. This would align with our team's philosophy of having engineers "deploy platforms, services, abstractions, and frameworks that allow the data scientists to conceive of, develop, and deploy their ideas with autonomy", and solve the pain of data exploration.

The final product, code-named Dora, is built with React, Redux.js and Victory, backed by Elasticsearch to enable fast and iterative data exploration, and uses Apache Spark to move data from our Amazon S3 data warehouse into the Elasticsearch cluster.

9 upvotes·1.8K views

Decision at StackShare about Redis, CircleCI, Webpack, Amazon CloudFront, Amazon S3, GitHub, Heroku, Rails, Node.js, Apollo, Glamorous, React, Microservices, StackDecisionsLaunch, SSR, FrontEndRepoSplit

Avatar of ruswerner
Lead Engineer at StackShare ·
RedisRedis
CircleCICircleCI
WebpackWebpack
Amazon CloudFrontAmazon CloudFront
Amazon S3Amazon S3
GitHubGitHub
HerokuHeroku
RailsRails
Node.jsNode.js
ApolloApollo
GlamorousGlamorous
ReactReact
#Microservices
#StackDecisionsLaunch
#SSR
#FrontEndRepoSplit

StackShare Feed is built entirely with React, Glamorous, and Apollo. One of our objectives with the public launch of the Feed was to enable a Server-side rendered (SSR) experience for our organic search traffic. When you visit the StackShare Feed, and you aren't logged in, you are delivered the Trending feed experience. We use an in-house Node.js rendering microservice to generate this HTML. This microservice needs to run and serve requests independent of our Rails web app. Up until recently, we had a mono-repo with our Rails and React code living happily together and all served from the same web process. In order to deploy our SSR app into a Heroku environment, we needed to split out our front-end application into a separate repo in GitHub. The driving factor in this decision was mostly due to limitations imposed by Heroku specifically with how processes can't communicate with each other. A new SSR app was created in Heroku and linked directly to the frontend repo so it stays in-sync with changes.

Related to this, we need a way to "deploy" our frontend changes to various server environments without building & releasing the entire Ruby application. We built a hybrid Amazon S3 Amazon CloudFront solution to host our Webpack bundles. A new CircleCI script builds the bundles and uploads them to S3. The final step in our rollout is to update some keys in Redis so our Rails app knows which bundles to serve. The result of these efforts were significant. Our frontend team now moves independently of our backend team, our build & release process takes only a few minutes, we are now using an edge CDN to serve JS assets, and we have pre-rendered React pages!

#StackDecisionsLaunch #SSR #Microservices #FrontEndRepoSplit

8 upvotes·4.7K views

Decision at Stitch Fix about Amazon EC2 Container Service, Elasticsearch, Amazon S3

Avatar of psunnn
Software Engineer at Stitch Fix ·
Amazon EC2 Container ServiceAmazon EC2 Container Service
ElasticsearchElasticsearch
Amazon S3Amazon S3

To load data from our Amazon S3 data warehouse into the Elasticsearch cluster, I developed a Spark application that uses PySpark to extract data from S3, partition, then batch-send each partition to Elasticsearch to increase parallelism. The Spark job enables fielddata: true for text columns with low cardinality to allow sub-aggregations by text columns and prevents data duplication by adding a unique _id field to each row in the dataframe.

The job can then be run by data scientists in Flotilla, an internal data platform tool for running jobs on Amazon EC2 Container Service, with environment variables specifying which schema and table to load.

7 upvotes·163 views

Decision about Amazon S3, Amazon Kinesis Firehose, AWS Lambda

Avatar of glenngillen
Glenn Gillen ·
Amazon S3Amazon S3
Amazon Kinesis FirehoseAmazon Kinesis Firehose
AWS LambdaAWS Lambda

I'm currently building out a Twitter analysis tool that's using AWS Lambda to stream data into Amazon Kinesis Firehose, which in turns saves the result to Amazon S3. The plan is to have Amazon S3 operate as both a data store and quasi-messaging bus with any post-processing work (e.g., notifications of new tweets going into Slack) fanning out from there. I went with this approach as I can get things up and running quickly and only pay for things on a pay-per-use basis rather than having lots of worker nodes sitting around waiting for work. Amazon Kinesis Firehose also makes it easy to add a different or additional data store in the future.

4 upvotes·340 views

Decision at Onedot about npm, Blueprint, Amazon S3, Apache Spark, Cassandra, TypeScript, Scala, Redux.js, React

Avatar of onedotadmin
CTO at Onedot ·
npmnpm
BlueprintBlueprint
Amazon S3Amazon S3
Apache SparkApache Spark
CassandraCassandra
TypeScriptTypeScript
ScalaScala
Redux.jsRedux.js
ReactReact

Onedot is building an automated data preparation service using probabilistic and statistical methods including artificial intelligence (AI). From the beginning, having a stable foundation while at the same time being able to iterate quickly was very important to us. Due to the nature of compute workloads we face, the decision for a functional programming paradigm and a scalable cluster model was a no-brainer. We started playing with Apache Spark very early on, when the platform was still in its infancy. As a storage backend, we first used Cassandra, but found out that it was not the optimal choice for our workloads (lots of rather smallish datasets, data pipelines with considerable complexity, etc.). In the end, we migrated dataset storage to Amazon S3 which proved to be much more adequate to our case. In the frontend, we bet on more traditional frameworks like React/Redux.js, Blueprint and a number of common npm packages of our universe. Because of the very positive experience with Scala (in particular the ability to write things very expressively, use immutability across the board, etc.) we settled with TypeScript in the frontend. In our opinion, a very good decision. Nowadays, transpiling is a common thing, so we thought why not introduce the same type-safety and mathematical rigour to the user interface?

1 upvote·177 views