
Amazon S3
Sometimes #ad-blocking addons can cause a real headache when working with JavaScript apps. Onboarding assistants (Appcues + elevio ), chat (Intercom) and product usage insight (Hotjar) have all landed on their blacklists. I guess there is a perfectly good reason for this that I just don't know.
In order to fix this, we had to set up our own content delivery service. We chose Amazon CloudFront and Amazon S3 to do the job because it has a good synergy with Heroku PaaS we are already using.
So if you look through my decisions you will see I recently wrote a decision about moving from Netlify to Buddy and Amazon S3.
I want to write another decision saying that I tried this out and actually moved back to Netlify. Buddy was great until they deleted my account and all my pipelines I setup without warning me because I didn't login for a month.
Netlify is amazing and way easier to setup, support is great and they have so many amazing options... I did learn things about Amazon S3 by moving over to there but I'm sticking with Netlify for the long run now.
Hi Johnny, Alex from Buddy here. I've just come across your post and I think it's a bit unfair.
The time between the last login and the actual termination is 90 days. Every user receives an email after two months of inactivity that the account is going to be canceled in one month unless somebody logs into it. We sent the email on April 27 (perhaps it got lost in your inbox).
However, nobody says you must be limited to either Buddy or Netlify. A couple of weeks ago we added a dedicated integration with Netlify, which means you can build and test your apps with Buddy, and use Netlify for deployments: https://buddy.works/blog/new-action-netlify-cli (wish Stackshare supported links!)
Hope this clears things a bit.
Cheers!
We are in the process of building a modern content platform to deliver our content through various channels. We decided to go with Microservices architecture as we wanted scale. Microservice architecture style is an approach to developing an application as a suite of small independently deployable services built around specific business capabilities. You can gain modularity, extensive parallelism and cost-effective scaling by deploying services across many distributed servers. Microservices modularity facilitates independent updates/deployments, and helps to avoid single point of failure, which can help prevent large-scale outages. We also decided to use Event Driven Architecture pattern which is a popular distributed asynchronous architecture pattern used to produce highly scalable applications. The event-driven architecture is made up of highly decoupled, single-purpose event processing components that asynchronously receive and process events.
To build our #Backend capabilities we decided to use the following: 1. #Microservices - Java with Spring Boot , Node.js with ExpressJS and Python with Flask 2. #Eventsourcingframework - Amazon Kinesis , Amazon Kinesis Firehose , Amazon SNS , Amazon SQS, AWS Lambda 3. #Data - Amazon RDS , Amazon DynamoDB , Amazon S3 , MongoDB Atlas
To build #Webapps we decided to use Angular 2 with RxJS
#Devops - GitHub , Travis CI , Terraform , Docker , Serverless
To provide employees with the critical need of interactive querying, we’ve worked with Presto, an open-source distributed SQL query engine, over the years. Operating Presto at Pinterest’s scale has involved resolving quite a few challenges like, supporting deeply nested and huge thrift schemas, slow/ bad worker detection and remediation, auto-scaling cluster, graceful cluster shutdown and impersonation support for ldap authenticator.
Our infrastructure is built on top of Amazon EC2 and we leverage Amazon S3 for storing our data. This separates compute and storage layers, and allows multiple compute clusters to share the S3 data.
We have hundreds of petabytes of data and tens of thousands of Apache Hive tables. Our Presto clusters are comprised of a fleet of 450 r4.8xl EC2 instances. Presto clusters together have over 100 TBs of memory and 14K vcpu cores. Within Pinterest, we have close to more than 1,000 monthly active users (out of total 1,600+ Pinterest employees) using Presto, who run about 400K queries on these clusters per month.
Each query submitted to Presto cluster is logged to a Kafka topic via Singer. Singer is a logging agent built at Pinterest and we talked about it in a previous post. Each query is logged when it is submitted and when it finishes. When a Presto cluster crashes, we will have query submitted events without corresponding query finished events. These events enable us to capture the effect of cluster crashes over time.
Each Presto cluster at Pinterest has workers on a mix of dedicated AWS EC2 instances and Kubernetes pods. Kubernetes platform provides us with the capability to add and remove workers from a Presto cluster very quickly. The best-case latency on bringing up a new worker on Kubernetes is less than a minute. However, when the Kubernetes cluster itself is out of resources and needs to scale up, it can take up to ten minutes. Some other advantages of deploying on Kubernetes platform is that our Presto deployment becomes agnostic of cloud vendor, instance types, OS, etc.
#BigData #AWS #DataScience #DataEngineering
ECS on AWS will reduce your cost on EC2 and Kubernetes. Athena may be another tool for reducing your cost by replacing the Presto. It takes advantage of the S3 as the storage and provided the serverless management for your infrastructure.
Hi, I'm building a machine learning pipelines to store image bytes and image vectors in the backend.
So, when users query for the random access image data (key), we return the image bytes and perform machine learning model operations on it.
I'm currently considering going with Amazon S3 (in the future, maybe add Redis caching layer) as the backend system to store the information (s3 buckets with sharded prefixes).
As the latency of S3 is 100-200ms (get/put) and it has a high throughput of 3500 puts/sec and 5500 gets/sec for a given bucker/prefix. In the future I need to reduce the latency, I can add Redis cache.
Also, s3 costs are way fewer than HBase (on Amazon EC2 instances with 3x replication factor)
I have not personally used HBase before, so can someone help me if I'm making the right choice here? I'm not aware of Hbase latencies and I have learned that the MOB feature on Hbase has to be turned on if we have store image bytes on of the column families as the avg image bytes are 240Kb.
Our whole DevOps stack consists of the following tools:
- GitHub (incl. GitHub Pages/Markdown for Documentation, GettingStarted and HowTo's) for collaborative review and code management tool
- Respectively Git as revision control system
- SourceTree as Git GUI
- Visual Studio Code as IDE
- CircleCI for continuous integration (automatize development process)
- Prettier / TSLint / ESLint as code linter
- SonarQube as quality gate
- Docker as container management (incl. Docker Compose for multi-container application management)
- VirtualBox for operating system simulation tests
- Kubernetes as cluster management for docker containers
- Heroku for deploying in test environments
- nginx as web server (preferably used as facade server in production environment)
- SSLMate (using OpenSSL) for certificate management
- Amazon EC2 (incl. Amazon S3) for deploying in stage (production-like) and production environments
- PostgreSQL as preferred database system
- Redis as preferred in-memory database/store (great for caching)
The main reason we have chosen Kubernetes over Docker Swarm is related to the following artifacts:
- Key features: Easy and flexible installation, Clear dashboard, Great scaling operations, Monitoring is an integral part, Great load balancing concepts, Monitors the condition and ensures compensation in the event of failure.
- Applications: An application can be deployed using a combination of pods, deployments, and services (or micro-services).
- Functionality: Kubernetes as a complex installation and setup process, but it not as limited as Docker Swarm.
- Monitoring: It supports multiple versions of logging and monitoring when the services are deployed within the cluster (Elasticsearch/Kibana (ELK), Heapster/Grafana, Sysdig cloud integration).
- Scalability: All-in-one framework for distributed systems.
- Other Benefits: Kubernetes is backed by the Cloud Native Computing Foundation (CNCF), huge community among container orchestration tools, it is an open source and modular tool that works with any OS.
So why is your deployment different for your (Heroku) test/dev and your stage/production?
When it comes to testing our web app we do not demand great computational resources and need a very simple, convenient and fast PaaS solution for deploying the app to our testers. In production though, the demand of great computational resources can rise very fast. With Amazon we are able to control that in better way.
Github Actions has been a breeze to work with. Github Actions offers a nice CI/CD service right inside the Github environment itself. It comes with tight integration with Github. Github Actions workflows are triggered based on a variety of events, such as: commit, pull request, comment...etc.
At Cereo, we serve our Webpack bundle from Amazon S3 and Amazon CloudFront. We're using Github Actions to build bundles, upload them to S3, and invalidate Cloudfront cache. Our codebase is a monorepo with multiple apps. Github Actions lets us check updates using paths so that we can limit executions. Sometimes a pull request still triggers multiple workflows, but we get to run them concurrently with Github Actions. In fact, Github Actions can run up to 20 concurrent jobs.
That being said, don't try to change your entire CI/CD solution in one afternoon. Github Actions is still new to the market. Start small and build your new shiny CI system up.
#CI #ContinuousIntegration #ContinuousDelivery
Hello! I have a mobile app with nearly 100k MAU, and I want to add a cloud file storage service to my app.
My app will allow users to store their image, video, and audio files and retrieve them to their device when necessary.
I have already decided to use PHP & Laravel as my backend, and I use Contabo VPS. Now, I need an object storage service for my app, and my options are:
Amazon S3 : It sounds to me like the best option but the most expensive. Closest to my users (MENA Region) for other services, I will have to go to Europe. Not sure how important this is?
DigitalOcean Spaces : Seems like my best option for price/service, but I am still not sure
Wasabi: the best price (6 USD/MONTH/TB) and free bandwidth, but I am not sure if it fits my needs as I want to allow my users to preview audio and video files. They don't recommend their service for streaming videos.
Backblaze B2 Cloud Storage: Good price but not sure about them.
There is also the self-hosted s3 compatible option, but I am not sure about that.
Any thoughts will be helpful. Also, if you think I should post in a different sub, please tell me.
We would like to connect a number of (about 25) video streams, from an Amazon S3 bucket containing video data to endpoints accessible to a Docker image, which, when run, will process the input video streams and emit some JSON statistics.
The 25 video streams should be synchronized. Could people share their experiences with a similar scenario and perhaps offer advice about which is better (Wowza, Amazon Kinesis Video Streams) for this kind of problem, or why they chose one technology over the other?
The video stream duration will be quite long (about 8 hours each x 25 camera sources). The 25 video streams will have no audio component. If you worked with a similar problem, what was your experience with scaling, latency, resource requirements, config, etc.?
I have different experience with processing video files that I'll describe below. It might be helpful or at least make you think a bit diffferent about the problem. What I did (part of it is a mistake): To increase the level of parallelism at the time consuming step which was the video upload, using a custom cmd tool written in Python, I splitted the input videos to much smaller chunks (without losing their ordering - just file name labeling with timestamp) . It then uploaded the chunks to S3. That triggered a few Lambdas that each first pulled a chunked video, did the processing with ffmpeg (the Lambdas were the mistake - at that time the local Lambda storage was up to 512MB so lots of chunks and lots of Lambdas had to be in place, also Lambda are hell to debug), later called Rekognition and later using AWS Elemental MediaConvert to rebuild the full length video. I would use some sort of ECS deployment where processing is triggered by S3 event, and scale the number of Fargate nodes dependent on the number of chucks/videos. Then each processor pulls its video (not stream) to its local storage (local EBS drive) and works. I failed to understand why are you trying to stream videos that are basically static, as a file, or that putting the files on S3 is a current limitation (while your input videos are 'live' and streaming) that you're trying to remove ?