Python

Python

Application and Data / Languages & Frameworks / Languages
Avatar of nparsons08
Director of Developer Marketing at Stream ·

Winds 2.0 is an open source Podcast/RSS reader developed by Stream with a core goal to enable a wide range of developers to contribute.

We chose JavaScript because nearly every developer knows or can, at the very least, read JavaScript. With ES6 and Node.js v10.x.x, it’s become a very capable language. Async/Await is powerful and easy to use (Async/Await vs Promises). Babel allows us to experiment with next-generation JavaScript (features that are not in the official JavaScript spec yet). Yarn allows us to consistently install packages quickly (and is filled with tons of new tricks)

We’re using JavaScript for everything – both front and backend. Most of our team is experienced with Go and Python, so Node was not an obvious choice for this app.

Sure... there will be haters who refuse to acknowledge that there is anything remotely positive about JavaScript (there are even rants on Hacker News about Node.js); however, without writing completely in JavaScript, we would not have seen the results we did.

#FrameworksFullStack #Languages

READ MORE
How Stream Built a Modern RSS Reader With JavaScript - Stream Tech Stack | StackShare (stackshare.io)
34 upvotes·429.4K views
Avatar of jeyabalajis
CTO at FundsCorner ·

Recently we were looking at a few robust and cost-effective ways of replicating the data that resides in our production MongoDB to a PostgreSQL database for data warehousing and business intelligence.

We set ourselves the following criteria for the optimal tool that would do this job: - The data replication must be near real-time, yet it should NOT impact the production database - The data replication must be horizontally scalable (based on the load), asynchronous & crash-resilient

Based on the above criteria, we selected the following tools to perform the end to end data replication:

We chose MongoDB Stitch for picking up the changes in the source database. It is the serverless platform from MongoDB. One of the services offered by MongoDB Stitch is Stitch Triggers. Using stitch triggers, you can execute a serverless function (in Node.js) in real time in response to changes in the database. When there are a lot of database changes, Stitch automatically "feeds forward" these changes through an asynchronous queue.

We chose Amazon SQS as the pipe / message backbone for communicating the changes from MongoDB to our own replication service. Interestingly enough, MongoDB stitch offers integration with AWS services.

In the Node.js function, we wrote minimal functionality to communicate the database changes (insert / update / delete / replace) to Amazon SQS.

Next we wrote a minimal micro-service in Python to listen to the message events on SQS, pickup the data payload & mirror the DB changes on to the target Data warehouse. We implemented source data to target data translation by modelling target table structures through SQLAlchemy . We deployed this micro-service as AWS Lambda with Zappa. With Zappa, deploying your services as event-driven & horizontally scalable Lambda service is dumb-easy.

In the end, we got to implement a highly scalable near realtime Change Data Replication service that "works" and deployed to production in a matter of few days!

READ MORE
24 upvotes·583.6K views

Context: I wanted to create an end to end IoT data pipeline simulation in Google Cloud IoT Core and other GCP services. I never touched Terraform meaningfully until working on this project, and it's one of the best explorations in my development career. The documentation and syntax is incredibly human-readable and friendly. I'm used to building infrastructure through the google apis via Python , but I'm so glad past Sung did not make that decision. I was tempted to use Google Cloud Deployment Manager, but the templates were a bit convoluted by first impression. I'm glad past Sung did not make this decision either.

Solution: Leveraging Google Cloud Build Google Cloud Run Google Cloud Bigtable Google BigQuery Google Cloud Storage Google Compute Engine along with some other fun tools, I can deploy over 40 GCP resources using Terraform!

Check Out My Architecture: CLICK ME

Check out the GitHub repo attached

READ MORE
GitHub - sungchun12/iot-python-webapp: Live, real-time dashboard in a serverless docker web app, and deployed via terraform with a built-in CICD trigger-See Mock Website (github.com)
23 upvotes·2 comments·89.1K views
Avatar of dmitry-mukhin
CTO at Uploadcare ·

Simple controls over complex technologies, as we put it, wouldn't be possible without neat UIs for our user areas including start page, dashboard, settings, and docs.

Initially, there was Django. Back in 2011, considering our Python-centric approach, that was the best choice. Later, we realized we needed to iterate on our website more quickly. And this led us to detaching Django from our front end. That was when we decided to build an SPA.

For building user interfaces, we're currently using React as it provided the fastest rendering back when we were building our toolkit. It’s worth mentioning Uploadcare is not a front-end-focused SPA: we aren’t running at high levels of complexity. If it were, we’d go with Ember.js.

However, there's a chance we will shift to the faster Preact, with its motto of using as little code as possible, and because it makes more use of browser APIs. One of our future tasks for our front end is to configure our Webpack bundler to split up the code for different site sections. For styles, we use PostCSS along with its plugins such as cssnano which minifies all the code.

All that allows us to provide a great user experience and quickly implement changes where they are needed with as little code as possible.

READ MORE
How Uploadcare Built a Stack That Handles 350M File API Requests Per Day - Uploadcare Tech Stack | StackShare (stackshare.io)
22 upvotes·385.2K views
Avatar of conor
Tech Brand Mgr, Office of CTO at Uber ·

How Uber developed the open source, end-to-end distributed tracing Jaeger , now a CNCF project:

Distributed tracing is quickly becoming a must-have component in the tools that organizations use to monitor their complex, microservice-based architectures. At Uber, our open source distributed tracing system Jaeger saw large-scale internal adoption throughout 2016, integrated into hundreds of microservices and now recording thousands of traces every second.

Here is the story of how we got here, from investigating off-the-shelf solutions like Zipkin, to why we switched from pull to push architecture, and how distributed tracing will continue to evolve:

https://eng.uber.com/distributed-tracing/

(GitHub Pages : https://www.jaegertracing.io/, GitHub: https://github.com/jaegertracing/jaeger)

Bindings/Operator: Python Java Node.js Go C++ Kubernetes JavaScript OpenShift C# Apache Spark

READ MORE
Evolving Distributed Tracing at Uber Engineering | Uber Engineering Blog (eng.uber.com)
20 upvotes·2 comments·1.1M views
Avatar of dmitry-mukhin
CTO at Uploadcare ·

The 350M API requests we handle daily include many processing tasks such as image enhancements, resizing, filtering, face recognition, and GIF to video conversions.

Tornado is the one we currently use and aiohttp is the one we intend to implement in production in the near future. Both tools support handling huge amounts of requests but aiohttp is preferable as it uses asyncio which is Python-native. Since Python is in the heart of our service, we initially used PIL followed by Pillow. We kind of still do. When we figured resizing was the most taxing processing operation, Alex, our engineer, created the fork named Pillow-SIMD and implemented a good number of optimizations into it to make it 15 times faster than ImageMagick

Thanks to the optimizations, Uploadcare now needs six times fewer servers to process images. Here, by servers I also mean separate Amazon EC2 instances handling processing and the first layer of caching. The processing instances are also paired with AWS Elastic Load Balancing (ELB) which helps ingest files to the CDN.

READ MORE
How Uploadcare Built a Stack That Handles 350M File API Requests Per Day - Uploadcare Tech Stack | StackShare (stackshare.io)
20 upvotes·79.5K views

I love Python and JavaScript . You can do the same JavaScript async operations in Python by using asyncio. This is particularly useful when you need to do socket programming in Python. With streaming sockets, data can be sent or received at any time. In case your Python program is in the middle of executing some code, other threads can handle the new socket data. Libraries like asyncio implement multiple threads, so your Python program can work in an asynchronous fashion. PubNub makes bi-directional data streaming between devices even easier.

READ MORE
Socket Programming with Python and PubNub - PubNub Tech Stack (stackshare.io)
20 upvotes·2 comments·73.7K views
Avatar of ecolson
Chief Algorithms Officer at Stitch Fix ·

The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. We store data in an Amazon S3 based data warehouse. Apache Spark on Yarn is our tool of choice for data movement and #ETL. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.

Beyond data movement and ETL, most #ML centric jobs (e.g. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see https://github.com/stitchfix/flotilla-os).

At Stitch Fix, algorithmic integrations are pervasive across the business. We have dozens of data products actively integrated systems. That requires serving layer that is robust, agile, flexible, and allows for self-service. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. This provides our data scientist a one-click method of getting from their algorithms to production. We then integrate those deployments into a service mesh, which allows us to A/B test various implementations in our product.

For more info:

#DataScience #DataStack #Data

READ MORE
Stitch Fix Algorithms Tour (algorithms-tour.stitchfix.com)
19 upvotes·481.5K views
Avatar of jtcunning
Operations Engineer at Sentry ·
Shared insights
on
PythonPythonRustRust
at

Sentry's event processing pipeline, which is responsible for handling all of the ingested event data that makes it through to our offline task processing, is written primarily in Python.

For particularly intense code paths, like our source map processing pipeline, we have begun re-writing those bits in Rust. Rust’s lack of garbage collection makes it a particularly convenient language for embedding in Python. It allows us to easily build a Python extension where all memory is managed from the Python side (if the Python wrapper gets collected by the Python GC we clean up the Rust object as well).

READ MORE
How Sentry Receives 20 Billion Events Per Month While Preparing to Handle Twice That - Sentry Tech Stack | StackShare (stackshare.io)
18 upvotes·1 comment·55.7K views
Avatar of mehilba
Co-Founder and COO at Magalix ·

We are hardcore Kubernetes users and contributors. We loved the automation it provides. However, as our team grew and added more clusters and microservices, capacity and resources management becomes a massive pain to us. We started suffering from a lot of outages and unexpected behavior as we promote our code from dev to production environments. Luckily we were working on our AI-powered tools to understand different dependencies, predict usage, and calculate the right resources and configurations that should be applied to our infrastructure and microservices. We dogfooded our agent (http://github.com/magalixcorp/magalix-agent) and were able to stabilize as the #autopilot continuously recovered any miscalculations we made or because of unexpected changes in workloads. We are open sourcing our agent in a few days. Check it out and let us know what you think! We run workloads on Microsoft Azure Google Kubernetes Engine and Amazon EC2 and we're all about Go and Python!

READ MORE
Our experience with an autopilot controlled infrastructure! (medium.com)
16 upvotes·2 comments·82.9K views