Hack vs R: What are the differences?
Hack: A programming language for HHVM that interoperates seamlessly with PHP. Hack provides instantaneous type checking via a local server that watches the filesystem. It typically runs in less than 200 milliseconds, making it easy to integrate into your development workflow without introducing a noticeable delay; R: A language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible.
Hack and R can be categorized as "Languages" tools.
"Interoperates seamlessly with php" is the primary reason why developers consider Hack over the competitors, whereas "Data analysis " was stated as the key factor in picking R.
AdRoll, Instacart, and Verba are some of the popular companies that use R, whereas Hack is used by Facebook, Slack, and Wizters. R has a broader approval, being mentioned in 128 company stacks & 97 developers stacks; compared to Hack, which is listed in 8 company stacks and 3 developer stacks.
What is Hack?
What is R?
Need advice about which tool to choose?Ask the StackShare community!
Sign up to add, upvote and see more prosMake informed product decisions
What are the cons of using Hack?
Sign up to add, upvote and see more consMake informed product decisions
Sign up to get full access to all the companiesMake informed product decisions
Throughout 2016, Slack began migrating from PHP5 to Hack. They cite several well-known challenges inherent to PHP, including surprise type conversions, inconsistency around reference semantics, inconsistencies in the standard library, and the fact that “PHP tries very, very hard to keep the request running, even if it has done something deeply strange.”
To overcome these challenges while maintaining the unique values of PHP, Slack turned to Hack, a gradual typing system for PHP. Hack runs on the HipHop Virtual Machine, or HHVM, an open source just-in-time (JIT) environment for PHP.
Since the beginning, Cal Henderson has been the CTO of Slack. Earlier this year, he commented on a Quora question summarizing their current stack.Apps
- Desktop: And Electron to ship it as a desktop application.
- Android: a mix of Java and Kotlin.
- iOS: written in a mix of Objective C and Swift.
- The core application and the API written in PHP/Hack that runs on HHVM.
- The data is stored in MySQL using Vitess.
- Caching is done using Memcached and MCRouter.
- The search service takes help from SolrCloud, with various Java services.
- The messaging system uses WebSockets with many services in Java and Go.
- Load balancing is done using HAproxy with Consul for configuration.
- Most services talk to each other over gRPC,
- Some Thrift and JSON-over-HTTP
- Voice and video calling service was built in Elixir.
- Built using open source tools including Presto, Spark, Airflow, Hadoop and Kafka.
The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. We store data in an Amazon S3 based data warehouse. Apache Spark on Yarn is our tool of choice for data movement and #ETL. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.
Beyond data movement and ETL, most #ML centric jobs (e.g. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see https://github.com/stitchfix/flotilla-os).
At Stitch Fix, algorithmic integrations are pervasive across the business. We have dozens of data products actively integrated systems. That requires serving layer that is robust, agile, flexible, and allows for self-service. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. This provides our data scientist a one-click method of getting from their algorithms to production. We then integrate those deployments into a service mesh, which allows us to A/B test various implementations in our product.
For more info:
- Our Algorithms Tour: https://algorithms-tour.stitchfix.com/
- Our blog: https://multithreaded.stitchfix.com/blog/
- Careers: https://multithreaded.stitchfix.com/careers/
#DataScience #DataStack #Data
What are my other choices for a vectorized statistics language. Professor was pushing SAS Jump (or was that SPSS) with a menu-driven point and click approach. (Reproducibility can still be accomplished, you publish the script generated by all your clicks.) But I want to type everything, great online tutorials for R. I think I made the right pick.
Connect to database, data analytics, draw diagram. Machine Learning application, and also used Spark-R for big data processing.
Visualisation of air quality in various rooms by RShiny (hosted free on shinyapps.io)