What is Python?
What is R Language?
Need advice about which tool to choose?Ask the StackShare community!
Sign up to add, upvote and see more prosMake informed product decisions
Sign up to add, upvote and see more consMake informed product decisions
Sign up to get full access to all the companiesMake informed product decisions
Sign up to get full access to all the tool integrationsMake informed product decisions
I use Visual Studio Code because at this time is a mature software and I can do practically everything using it.
It's free and open source: The project is hosted on GitHub and it’s free to download, fork, modify and contribute to the project.
Multi-platform: You can download binaries for different platforms, included Windows (x64), MacOS and Linux (
LightWeight: It runs smoothly in different devices. It has an average memory and CPU usage. Starts almost immediately and it’s very stable.
.properties, XML and JSON files.
Integrated tools: Includes an integrated terminal, debugger, problem list and console output inspector. The project navigator sidebar is simple and powerful: you can manage your files and folders with ease. The command palette helps you find commands by text. The search widget has a powerful auto-complete feature to search and find your files.
Extensible and configurable: There are many extensions available for every language supported, including syntax highlighters, IntelliSense and code completion, and debuggers. There are also extension to manage application configuration and architecture like Docker and Jenkins.
Integrated with Git: You can visually manage your project repositories, pull, commit and push your changes, and easy conflict resolution.( there is support for SVN (Subversion) users by plugin)
Multiple systems means there is a requirement to cart data across them.
Started off with Talend scripts. This was great as what we initially had were PHP/Python script - allowed for a more systematic approach to ETL.
But ended up with a massive repository of scripts, complex crontab entries and regular failures due to memory issues.
Using Stitch or similar services is a better approach: - no need to worry about the infrastructure needed for the ETL processes - a more formal mapping of data from source to destination as opposed to script developer doing his/her voodoo magic - lot of common sources and destination integrations are already builtin and out of the box
The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. We store data in an Amazon S3 based data warehouse. Apache Spark on Yarn is our tool of choice for data movement and #ETL. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.
Beyond data movement and ETL, most #ML centric jobs (e.g. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see https://github.com/stitchfix/flotilla-os).
At Stitch Fix, algorithmic integrations are pervasive across the business. We have dozens of data products actively integrated systems. That requires serving layer that is robust, agile, flexible, and allows for self-service. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. This provides our data scientist a one-click method of getting from their algorithms to production. We then integrate those deployments into a service mesh, which allows us to A/B test various implementations in our product.
For more info:
- Our Algorithms Tour: https://algorithms-tour.stitchfix.com/
- Our blog: https://multithreaded.stitchfix.com/blog/
- Careers: https://multithreaded.stitchfix.com/careers/
#DataScience #DataStack #Data
Possible pros for Python / Django: - easy syntax, easier to learn for me as a beginner - fast development, earlier release - libraries for mathematical and scientific computation
Which software would you use in my case? Are my arguments for Python/NodeJS right? Which kind of database would you use?
Thank you for your answer!
Go is a high performance language with simple syntax / semantics. Although it is not as expressive as some other languages, it's still a great language for backend development.
Python is expressive and battery-included, and pre-installed in most linux distros, making it a great language for scripting.
PostgreSQL: Rock-solid RDBMS with NoSQL support.
NATS: fast message queue and easy to deploy / maintain.
Docker makes deployment painless.
Git essential tool for collaboration and source management.
I'm #Fullstack here and work with Vue.js, React and Node.js in some projects but also C# for other clients. Also started learning Python. And all this with just one tool!: #Vscode I have used Atom and Sublime Text in the past and they are very good too, but for me now is just vscode. I think the combination of vscode with the free available extensions that the community is creating makes a powerful tool and that's why vscode became the most popular IDE for software development. You can match it to your own needs in a couple of minutes. Did I mention you can style it your way? Amazing tool!