C vs R: What are the differences?
Developers describe C as "One of the most widely used programming languages of all time". . On the other hand, R is detailed as "A language and environment for statistical computing and graphics". R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible.
C and R belong to "Languages" category of the tech stack.
"Performance" is the primary reason why developers consider C over the competitors, whereas "Data analysis " was stated as the key factor in picking R.
AdRoll, Twitch, and Redis Labs are some of the popular companies that use C, whereas R is used by AdRoll, Instacart, and Verba. C has a broader approval, being mentioned in 64 company stacks & 251 developers stacks; compared to R, which is listed in 128 company stacks and 97 developer stacks.
What is C?
What is R Language?
Need advice about which tool to choose?Ask the StackShare community!
Sign up to add, upvote and see more prosMake informed product decisions
Sign up to add, upvote and see more consMake informed product decisions
Sign up to get full access to all the companiesMake informed product decisions
Sign up to get full access to all the tool integrationsMake informed product decisions
One important decision for delivering a platform independent solution with low memory footprint and minimal dependencies was the choice of the programming language. We considered a few from Python (there was already a reasonably large Python code base at Thumbtack), to Go (we were taking our first steps with it), and even Rust (too immature at the time).
We ended up writing it in C. It was easy to meet all requirements with only one external dependency for implementing the web server, clearly no challenges running it on any of the Linux distributions we were maintaining, and arguably the implementation with the smallest memory footprint given the choices above.
Why Uber developed H3, our open source grid system to make geospatial data visualization and exploration easier and more efficient:
We decided to create H3 to combine the benefits of a hexagonal global grid system with a hierarchical indexing system. A global grid system usually requires at least two things: a map projection and a grid laid on top of the map. For map projection, we chose to use gnomonic projections centered on icosahedron faces. This projects from Earth as a sphere to an icosahedron, a twenty-sided platonic solid. The H3 grid is constructed by laying out 122 base cells over the Earth, with ten cells per face. H3 supports sixteen resolutions: https://eng.uber.com/h3/
The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. We store data in an Amazon S3 based data warehouse. Apache Spark on Yarn is our tool of choice for data movement and #ETL. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.
Beyond data movement and ETL, most #ML centric jobs (e.g. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see https://github.com/stitchfix/flotilla-os).
At Stitch Fix, algorithmic integrations are pervasive across the business. We have dozens of data products actively integrated systems. That requires serving layer that is robust, agile, flexible, and allows for self-service. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. This provides our data scientist a one-click method of getting from their algorithms to production. We then integrate those deployments into a service mesh, which allows us to A/B test various implementations in our product.
For more info:
- Our Algorithms Tour: https://algorithms-tour.stitchfix.com/
- Our blog: https://multithreaded.stitchfix.com/blog/
- Careers: https://multithreaded.stitchfix.com/careers/
#DataScience #DataStack #Data
At FlowStack we write most of our backend in Go. Go is a well thought out language, with all the right compromises for speedy development of speedy and robust software. It's tooling is part of what makes Go such a great language. Testing and benchmarking is built into the language, in a way that makes it easy to ensure correctness and high performance. In most cases you can get more performance out of Rust and C or C++, but getting everything right is more cumbersome.
What are my other choices for a vectorized statistics language. Professor was pushing SAS Jump (or was that SPSS) with a menu-driven point and click approach. (Reproducibility can still be accomplished, you publish the script generated by all your clicks.) But I want to type everything, great online tutorials for R. I think I made the right pick.
been programming in c for over a decade, since learning it in college. still use it for various low level projects. used it recently to develop an embedded application for a custom board.
The core of the arcapos applications is written in C, so are most of the Lua modules (bindings to various hardware or protocols).
Connect to database, data analytics, draw diagram. Machine Learning application, and also used Spark-R for big data processing.
The Sqreen PHP agent is both a PHP extension, built in C, and a daemon built in Python.
Visualisation of air quality in various rooms by RShiny (hosted free on shinyapps.io)