Feed powered byStream Blue Logo Copy 5
Python

Python

Application and Data / Languages & Frameworks / Languages

Decision at Stream about Go, Stream, Python, Yarn, Babel, Node.js, ES6, JavaScript, Languages, FrameworksFullStack

Avatar of nparsons08
Node.js Engineer & Evangelist at Stream ·
GoGo
StreamStream
PythonPython
YarnYarn
BabelBabel
Node.jsNode.js
ES6ES6
JavaScriptJavaScript
#Languages
#FrameworksFullStack

Winds 2.0 is an open source Podcast/RSS reader developed by Stream with a core goal to enable a wide range of developers to contribute.

We chose JavaScript because nearly every developer knows or can, at the very least, read JavaScript. With ES6 and Node.js v10.x.x, it’s become a very capable language. Async/Await is powerful and easy to use (Async/Await vs Promises). Babel allows us to experiment with next-generation JavaScript (features that are not in the official JavaScript spec yet). Yarn allows us to consistently install packages quickly (and is filled with tons of new tricks)

We’re using JavaScript for everything – both front and backend. Most of our team is experienced with Go and Python, so Node was not an obvious choice for this app.

Sure... there will be haters who refuse to acknowledge that there is anything remotely positive about JavaScript (there are even rants on Hacker News about Node.js); however, without writing completely in JavaScript, we would not have seen the results we did.

#FrameworksFullStack #Languages

29 upvotes·6.7K views

Decision at FundsCorner about Zappa, AWS Lambda, SQLAlchemy, Python, Amazon SQS, Node.js, MongoDB Stitch, PostgreSQL, MongoDB

Avatar of jeyabalajis
ZappaZappa
AWS LambdaAWS Lambda
SQLAlchemySQLAlchemy
PythonPython
Amazon SQSAmazon SQS
Node.jsNode.js
MongoDB StitchMongoDB Stitch
PostgreSQLPostgreSQL
MongoDBMongoDB

Recently we were looking at a few robust and cost-effective ways of replicating the data that resides in our production MongoDB to a PostgreSQL database for data warehousing and business intelligence.

We set ourselves the following criteria for the optimal tool that would do this job: - The data replication must be near real-time, yet it should NOT impact the production database - The data replication must be horizontally scalable (based on the load), asynchronous & crash-resilient

Based on the above criteria, we selected the following tools to perform the end to end data replication:

We chose MongoDB Stitch for picking up the changes in the source database. It is the serverless platform from MongoDB. One of the services offered by MongoDB Stitch is Stitch Triggers. Using stitch triggers, you can execute a serverless function (in Node.js) in real time in response to changes in the database. When there are a lot of database changes, Stitch automatically "feeds forward" these changes through an asynchronous queue.

We chose Amazon SQS as the pipe / message backbone for communicating the changes from MongoDB to our own replication service. Interestingly enough, MongoDB stitch offers integration with AWS services.

In the Node.js function, we wrote minimal functionality to communicate the database changes (insert / update / delete / replace) to Amazon SQS.

Next we wrote a minimal micro-service in Python to listen to the message events on SQS, pickup the data payload & mirror the DB changes on to the target Data warehouse. We implemented source data to target data translation by modelling target table structures through SQLAlchemy . We deployed this micro-service as AWS Lambda with Zappa. With Zappa, deploying your services as event-driven & horizontally scalable Lambda service is dumb-easy.

In the end, we got to implement a highly scalable near realtime Change Data Replication service that "works" and deployed to production in a matter of few days!

23 upvotes·18.7K views

Decision at Uploadcare about PostCSS, Preact, Ember.js, React, Python, Django

Avatar of dmitry-mukhin
PostCSSPostCSS
PreactPreact
Ember.jsEmber.js
ReactReact
PythonPython
DjangoDjango

Simple controls over complex technologies, as we put it, wouldn't be possible without neat UIs for our user areas including start page, dashboard, settings, and docs.

Initially, there was Django. Back in 2011, considering our Python-centric approach, that was the best choice. Later, we realized we needed to iterate on our website more quickly. And this led us to detaching Django from our front end. That was when we decided to build an SPA.

For building user interfaces, we're currently using React as it provided the fastest rendering back when we were building our toolkit. It’s worth mentioning Uploadcare is not a front-end-focused SPA: we aren’t running at high levels of complexity. If it were, we’d go with Ember.js.

However, there's a chance we will shift to the faster Preact, with its motto of using as little code as possible, and because it makes more use of browser APIs. One of our future tasks for our front end is to configure our Webpack bundler to split up the code for different site sections. For styles, we use PostCSS along with its plugins such as cssnano which minifies all the code.

All that allows us to provide a great user experience and quickly implement changes where they are needed with as little code as possible.

20 upvotes·10.9K views

Decision at Uploadcare about AWS Elastic Load Balancing (ELB), Amazon EC2, Python, Tornado

Avatar of dmitry-mukhin
AWS Elastic Load Balancing (ELB)AWS Elastic Load Balancing (ELB)
Amazon EC2Amazon EC2
PythonPython
TornadoTornado

The 350M API requests we handle daily include many processing tasks such as image enhancements, resizing, filtering, face recognition, and GIF to video conversions.

Tornado is the one we currently use and aiohttp is the one we intend to implement in production in the near future. Both tools support handling huge amounts of requests but aiohttp is preferable as it uses asyncio which is Python-native. Since Python is in the heart of our service, we initially used PIL followed by Pillow. We kind of still do. When we figured resizing was the most taxing processing operation, Alex, our engineer, created the fork named Pillow-SIMD and implemented a good number of optimizations into it to make it 15 times faster than ImageMagick

Thanks to the optimizations, Uploadcare now needs six times fewer servers to process images. Here, by servers I also mean separate Amazon EC2 instances handling processing and the first layer of caching. The processing instances are also paired with AWS Elastic Load Balancing (ELB) which helps ingest files to the CDN.

20 upvotes·1.9K views

Decision at Stream about Go, Cassandra, Python, Databases, DataStores

Avatar of tschellenbach
GoGo
CassandraCassandra
PythonPython
#Databases
#DataStores

After years of optimizing our existing feed technology, we decided to make a larger leap with 2.0 of Stream. While the first iteration of Stream was powered by Python and Cassandra, for Stream 2.0 of our infrastructure we switched to Go.

The main reason why we switched from Python to Go is performance. Certain features of Stream such as aggregation, ranking and serialization were very difficult to speed up using Python.

We’ve been using Go since March 2017 and it’s been a great experience so far. Go has greatly increased the productivity of our development team. Not only has it improved the speed at which we develop, it’s also 30x faster for many components of Stream. Initially we struggled a bit with package management for Go. However, using Dep together with the VG package contributed to creating a great workflow.

Go as a language is heavily focused on performance. The built-in PPROF tool is amazing for finding performance issues. Uber’s Go-Torch library is great for visualizing data from PPROF and will be bundled in PPROF in Go 1.10.

The performance of Go greatly influenced our architecture in a positive way. With Python we often found ourselves delegating logic to the database layer purely for performance reasons. The high performance of Go gave us more flexibility in terms of architecture. This led to a huge simplification of our infrastructure and a dramatic improvement of latency. For instance, we saw a 10 to 1 reduction in web-server count thanks to the lower memory and CPU usage for the same number of requests.

#DataStores #Databases

19 upvotes·5.2K views

Decision at Sentry about Rust, Python

Avatar of jtcunning
Operations Engineer at Sentry ·
RustRust
PythonPython

Sentry's event processing pipeline, which is responsible for handling all of the ingested event data that makes it through to our offline task processing, is written primarily in Python.

For particularly intense code paths, like our source map processing pipeline, we have begun re-writing those bits in Rust. Rust’s lack of garbage collection makes it a particularly convenient language for embedding in Python. It allows us to easily build a Python extension where all memory is managed from the Python side (if the Python wrapper gets collected by the Python GC we clean up the Rust object as well).

18 upvotes·1 comment·4.9K views

Decision at Thumbtack about C, Go, Rust, Python

Avatar of marcoalmeida
CC
GoGo
RustRust
PythonPython

One important decision for delivering a platform independent solution with low memory footprint and minimal dependencies was the choice of the programming language. We considered a few from Python (there was already a reasonably large Python code base at Thumbtack), to Go (we were taking our first steps with it), and even Rust (too immature at the time).

We ended up writing it in C. It was easy to meet all requirements with only one external dependency for implementing the web server, clearly no challenges running it on any of the Linux distributions we were maintaining, and arguably the implementation with the smallest memory footprint given the choices above.

15 upvotes·8.4K views

Decision at Uploadcare about PostgreSQL, Amazon DynamoDB, Amazon S3, Redis, Python, Google App Engine

Avatar of dmitry-mukhin
PostgreSQLPostgreSQL
Amazon DynamoDBAmazon DynamoDB
Amazon S3Amazon S3
RedisRedis
PythonPython
Google App EngineGoogle App Engine

Uploadcare has built an infinitely scalable infrastructure by leveraging AWS. Building on top of AWS allows us to process 350M daily requests for file uploads, manipulations, and deliveries. When we started in 2011 the only cloud alternative to AWS was Google App Engine which was a no-go for a rather complex solution we wanted to build. We also didn’t want to buy any hardware or use co-locations.

Our stack handles receiving files, communicating with external file sources, managing file storage, managing user and file data, processing files, file caching and delivery, and managing user interface dashboards.

At its core, Uploadcare runs on Python. The Europython 2011 conference in Florence really inspired us, coupled with the fact that it was general enough to solve all of our challenges informed this decision. Additionally we had prior experience working in Python.

We chose to build the main application with Django because of its feature completeness and large footprint within the Python ecosystem.

All the communications within our ecosystem occur via several HTTP APIs, Redis, Amazon S3, and Amazon DynamoDB. We decided on this architecture so that our our system could be scalable in terms of storage and database throughput. This way we only need Django running on top of our database cluster. We use PostgreSQL as our database because it is considered an industry standard when it comes to clustering and scaling.

15 upvotes·2.9K views

Decision at Dubsmash about Kubernetes, Amazon EC2, Heroku, Python, ContainerTools, PlatformAsAService

Avatar of tspecht
‎Co-Founder and CTO at Dubsmash ·
KubernetesKubernetes
Amazon EC2Amazon EC2
HerokuHeroku
PythonPython
#ContainerTools
#PlatformAsAService

Since we deployed our very first lines of Python code more than 2 years ago we are happy users of Heroku. It lets us focus on building features rather than maintaining infrastructure, has super-easy scaling capabilities, and the support team is always happy to help (in the rare case you need them).

We played with the thought of moving our computational needs over to barebone Amazon EC2 instances or a container-management solution like Kubernetes a couple of times, but the added costs of maintaining this architecture and the ease-of-use of Heroku have kept us from moving forward so far.

Running independent services for different needs of our features gives us the flexibility to choose whatever data storage is best for the given task.

#PlatformAsAService #ContainerTools

14 upvotes·3.5K views

Decision at Stitch about Go, Clojure, JavaScript, Python, Kubernetes, AWS OpsWorks, Amazon EC2, Amazon Redshift, Amazon S3, Amazon RDS

Avatar of jakestein
CEO at Stitch ·
GoGo
ClojureClojure
JavaScriptJavaScript
PythonPython
KubernetesKubernetes
AWS OpsWorksAWS OpsWorks
Amazon EC2Amazon EC2
Amazon RedshiftAmazon Redshift
Amazon S3Amazon S3
Amazon RDSAmazon RDS

Stitch is run entirely on AWS. All of our transactional databases are run with Amazon RDS, and we rely on Amazon S3 for data persistence in various stages of our pipeline. Our product integrates with Amazon Redshift as a data destination, and we also use Redshift as an internal data warehouse (powered by Stitch, of course).

The majority of our services run on stateless Amazon EC2 instances that are managed by AWS OpsWorks. We recently introduced Kubernetes into our infrastructure to run the scheduled jobs that execute Singer code to extract data from various sources. Although we tend to be wary of shiny new toys, Kubernetes has proven to be a good fit for this problem, and its stability, strong community and helpful tooling have made it easy for us to incorporate into our operations.

While we continue to be happy with Clojure for our internal services, we felt that its relatively narrow adoption could impede Singer's growth. We chose Python both because it is well suited to the task, and it seems to have reached critical mass among data engineers. All that being said, the Singer spec is language agnostic, and integrations and libraries have been developed in JavaScript, Go, and Clojure.

13 upvotes·6.7K views