What is Python and what are its top alternatives?
Python alternatives & related posts
related Java posts
How Uber developed the open source, end-to-end distributed tracing Jaeger , now a CNCF project:
Distributed tracing is quickly becoming a must-have component in the tools that organizations use to monitor their complex, microservice-based architectures. At Uber, our open source distributed tracing system Jaeger saw large-scale internal adoption throughout 2016, integrated into hundreds of microservices and now recording thousands of traces every second.
Here is the story of how we got here, from investigating off-the-shelf solutions like Zipkin, to why we switched from pull to push architecture, and how distributed tracing will continue to evolve:
related R Language posts
The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. We store data in an Amazon S3 based data warehouse. Apache Spark on Yarn is our tool of choice for data movement and #ETL. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.
Beyond data movement and ETL, most #ML centric jobs (e.g. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see https://github.com/stitchfix/flotilla-os).
At Stitch Fix, algorithmic integrations are pervasive across the business. We have dozens of data products actively integrated systems. That requires serving layer that is robust, agile, flexible, and allows for self-service. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. This provides our data scientist a one-click method of getting from their algorithms to production. We then integrate those deployments into a service mesh, which allows us to A/B test various implementations in our product.
For more info:
- Our Algorithms Tour: https://algorithms-tour.stitchfix.com/
- Our blog: https://multithreaded.stitchfix.com/blog/
- Careers: https://multithreaded.stitchfix.com/careers/
#DataScience #DataStack #Data
But wowza, things have changed. Tooling is just way, way better. I'm primarily web-oriented, and using React and Apollo together the past few years really opened my eyes to building rich apps. And I deeply apologize for using the phrase rich apps; I don't think I've ever said such Enterprisey words before.
But yeah, things are different now. I still love Rails, and still use it for a lot of apps I build. But it's that silly rich apps phrase that's the problem. Users have way more comprehensive expectations than they did even five years ago, and the JS community does a good job at building tools and tech that tackle the problems of making heavy, complicated UI and frontend work.
Winds 2.0 is an open source Podcast/RSS reader developed by Stream with a core goal to enable a wide range of developers to contribute.
related Scala posts
Lumosity is home to the world's largest cognitive training database, a responsibility we take seriously. For most of the company's history, our analysis of user behavior and training data has been powered by an event stream--first a simple Node.js pub/sub app, then a heavyweight Ruby app with stronger durability. Both supported decent throughput and latency, but they lacked some major features supported by existing open-source alternatives: replaying existing messages (also lacking in most message queue-based solutions), scaling out many different readers for the same stream, the ability to leverage existing solutions for reading and writing, and possibly most importantly: the ability to hire someone externally who already had expertise.
We ultimately migrated to Kafka in early- to mid-2016, citing both industry trends in companies we'd talked to with similar durability and throughput needs, the extremely strong documentation and community. We pored over Kyle Kingsbury's Jepsen post (https://aphyr.com/posts/293-jepsen-Kafka), as well as Jay Kreps' follow-up (http://blog.empathybox.com/post/62279088548/a-few-notes-on-kafka-and-jepsen), talked at length with Confluent folks and community members, and still wound up running parallel systems for quite a long time, but ultimately, we've been very, very happy. Understanding the internals and proper levers takes some commitment, but it's taken very little maintenance once configured. Since then, the Confluent Platform community has grown and grown; we've gone from doing most development using custom Scala consumers and producers to being 60/40 Kafka Streams/Connects.
We originally looked into Storm / Heron , and we'd moved on from Redis pub/sub. Heron looks great, but we already had a programming model across services that was more akin to consuming a message consumers than required a topology of bolts, etc. Heron also had just come out while we were starting to migrate things, and the community momentum and direction of Kafka felt more substantial than the older Storm. If we were to start the process over again today, we might check out Pulsar , although the ecosystem is much younger.
To find out more, read our 2017 engineering blog post about the migration!
Onedot is building an automated data preparation service using probabilistic and statistical methods including artificial intelligence (AI). From the beginning, having a stable foundation while at the same time being able to iterate quickly was very important to us. Due to the nature of compute workloads we face, the decision for a functional programming paradigm and a scalable cluster model was a no-brainer. We started playing with Apache Spark very early on, when the platform was still in its infancy. As a storage backend, we first used Cassandra, but found out that it was not the optimal choice for our workloads (lots of rather smallish datasets, data pipelines with considerable complexity, etc.). In the end, we migrated dataset storage to Amazon S3 which proved to be much more adequate to our case. In the frontend, we bet on more traditional frameworks like React/Redux.js, Blueprint and a number of common npm packages of our universe. Because of the very positive experience with Scala (in particular the ability to write things very expressively, use immutability across the board, etc.) we settled with TypeScript in the frontend. In our opinion, a very good decision. Nowadays, transpiling is a common thing, so we thought why not introduce the same type-safety and mathematical rigour to the user interface?
related Anaconda posts
I am going to learn machine learning and self host an online IDE, the tool that i may use is Python, Anaconda, various python library and etc. which tools should i go for? this may include Java development, web development. Now i have 1 more candidate which are visual studio code online (code server). i will host on google cloud
Jupyter Anaconda Pandas IPython
A great way to prototype your data analytic modules. The use of the package is simple and user-friendly and the migration from ipython to python is fairly simple: a lot of cleaning, but no more.
The negative aspect comes when you want to streamline your productive system or does CI with your anaconda environment: - most tools don't accept conda environments (as smoothly as pip requirements) - the conda environments (even with miniconda) have quite an overhead
related Perl posts
In addition to our fancy Docker setup, we have captured and sanitized production logs for the behavior of our legacy Perl MTA, and we can test that the log output from the new Go version behaves the same way as the old version. These tests are set up to allow us to switch between the legacy and new version of the MTA and ensure that both systems behave in a legacy-compatible way. Not only can we ensure that we operate against a variety of issues we've seen over time from inboxes, but we know that the newest version of our MTA continues to cover all the same expected behaviors of the legacy version. #CodeCollaborationVersionControl #ContinuousIntegration
related PHP posts
When I joined NYT there was already broad dissatisfaction with the LAMP (Linux Apache HTTP Server MySQL PHP) Stack and the front end framework, in particular. So, I wasn't passing judgment on it. I mean, LAMP's fine, you can do good work in LAMP. It's a little dated at this point, but it's not ... I didn't want to rip it out for its own sake, but everyone else was like, "We don't like this, it's really inflexible." And I remember from being outside the company when that was called MIT FIVE when it had launched. And been observing it from the outside, and I was like, you guys took so long to do that and you did it so carefully, and yet you're not happy with your decisions. Why is that? That was more the impetus. If we're going to do this again, how are we going to do it in a way that we're gonna get a better result?
So we're moving quickly away from LAMP, I would say. So, right now, the new front end is React based and using Apollo. And we've been in a long, protracted, gradual rollout of the core experiences.
React is now talking to GraphQL as a primary API. There's a Node.js back end, to the front end, which is mainly for server-side rendering, as well.
Behind there, the main repository for the GraphQL server is a big table repository, that we call Bodega because it's a convenience store. And that reads off of a Kafka pipeline.
Our whole Node.js backend stack consists of the following tools:
- Lerna as a tool for multi package and multi repository management
- npm as package manager
- NestJS as Node.js framework
- TypeScript as programming language
- ExpressJS as web server
- Swagger UI for visualizing and interacting with the API’s resources
- Postman as a tool for API development
- TypeORM as object relational mapping layer
- JSON Web Token for access token management
The main reason we have chosen Node.js over PHP is related to the following artifacts:
- Flexibility: Node.js sets very few strict dependencies, rules and guidelines and thus grants a high degree of flexibility in application development. There are no strict conventions so that the appropriate architecture, design structures, modules and features can be freely selected for the development.
related Ruby posts
I needed to choose a full stack of tools for cross platform mobile application design & development. After much research and trying different tools, these are what I came up with that work for me today:
For the client coding I chose Framework7 because of its performance, easy learning curve, and very well designed, beautiful UI widgets. I think it's perfect for solo development or small teams. I didn't like React Native. It felt heavy to me and rigid. Framework7 allows the use of #CSS3, which I think is the best technology to come out of the #WWW movement. No other tech has been able to allow designers and developers to develop such flexible, high performance, customisable user interface elements that are highly responsive and hardware accelerated before. Now #CSS3 includes variables and flexboxes it is truly a powerful language and there is no longer a need for preprocessors such as #SCSS / #Sass / #less. React Native contains a very limited interpretation of #CSS3 which I found very frustrating after using #CSS3 for some years already and knowing its powerful features. The other very nice feature of Framework7 is that you can even build for the browser if you want your app to be available for desktop web browsers. The latest release also includes the ability to build for #Electron so you can have MacOS, Windows and Linux desktop apps. This is not possible with React Native yet.
Framework7 runs on top of Apache Cordova. Cordova and webviews have been slated as being slow in the past. Having a game developer background I found the tweeks to make it run as smooth as silk. One of those tweeks is to use WKWebView. Another important one was using srcset on images.
I configured TypeScript to work with the latest version of Framework7. I consider TypeScript to be one of the best creations to come out of Microsoft in some time. They must have an amazing team working on it. It's very powerful and flexible. It helps you catch a lot of bugs and also provides code completion in supporting IDEs. So for my IDE I use Visual Studio Code which is a blazingly fast and silky smooth editor that integrates seamlessly with TypeScript for the ultimate type checking setup (both products are produced by Microsoft).
I use some Ruby scripts to process images with ImageMagick and pngquant to optimise for size and even auto insert responsive image code into the HTML5. Ruby is the ultimate cross platform scripting language. Even as your scripts become large, Ruby allows you to refactor your code easily and make it Object Oriented if necessary. I find it the quickest and easiest way to maintain certain aspects of my build process.
For the user interface design and prototyping I use Figma. Figma has an almost identical user interface to #Sketch but has the added advantage of being cross platform (MacOS and Windows). Its real-time collaboration features are outstanding and I use them a often as I work mostly on remote projects. Clients can collaborate in real-time and see changes I make as I make them. The clickable prototyping features in Figma are also very well designed and mean I can send clickable prototypes to clients to try user interface updates as they are made and get immediate feedback. I'm currently also evaluating the latest version of #AdobeXD as an alternative to Figma as it has the very cool auto-animate feature. It doesn't have real-time collaboration yet, but I heard it is proposed for 2019.
For the UI icons I use Font Awesome Pro. They have the largest selection and best looking icons you can find on the internet with several variations in styles so you can find most of the icons you want for standard projects.
For the backend I was using the #GraphCool Framework. As I later found out, #GraphQL still has some way to go in order to provide the full power of a mature graph query language so later in my project I ripped out #GraphCool and replaced it with CouchDB and Pouchdb. Primarily so I could provide good offline app support. CouchDB with Pouchdb is very flexible and efficient combination and overcomes some of the restrictions I found in #GraphQL and hence #GraphCool also. The most impressive and important feature of CouchDB is its replication. You can configure it in various ways for backups, fault tolerance, caching or conditional merging of databases. CouchDB and Pouchdb even supports storing, retrieving and serving binary or image data or other mime types. This removes a level of complexity usually present in database implementations where binary or image data is usually referenced through an #HTML5 link. With CouchDB and Pouchdb apps can operate offline and sync later, very efficiently, when the network connection is good.
I use PhoneGap when testing the app. It auto-reloads your app when its code is changed and you can also install it on Android phones to preview your app instantly. iOS is a bit more tricky cause of Apple's policies so it's not available on the App Store, but you can build it and install it yourself to your device.
So that's my latest mobile stack. What tools do you use? Have you tried these ones?