What is Apache Calcite and what are its top alternatives?
Top Alternatives to Apache Calcite
- Presto
Distributed SQL Query Engine for Big Data
- Apache Drill
Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google's Dremel. ...
- Apache Spark
Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning. ...
- Node.js
Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices. ...
- Django
Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design. ...
- ASP.NET
.NET is a developer platform made up of tools, programming languages, and libraries for building many different types of applications. ...
- Laravel
It is a web application framework with expressive, elegant syntax. It attempts to take the pain out of development by easing common tasks used in the majority of web projects, such as authentication, routing, sessions, and caching. ...
- Android SDK
Android provides a rich application framework that allows you to build innovative apps and games for mobile devices in a Java language environment. ...
Apache Calcite alternatives & related posts
- Works directly on files in s3 (no ETL)18
- Open-source13
- Join multiple databases12
- Scalable10
- Gets ready in minutes7
- MPP6
related Presto posts
To provide employees with the critical need of interactive querying, we’ve worked with Presto, an open-source distributed SQL query engine, over the years. Operating Presto at Pinterest’s scale has involved resolving quite a few challenges like, supporting deeply nested and huge thrift schemas, slow/ bad worker detection and remediation, auto-scaling cluster, graceful cluster shutdown and impersonation support for ldap authenticator.
Our infrastructure is built on top of Amazon EC2 and we leverage Amazon S3 for storing our data. This separates compute and storage layers, and allows multiple compute clusters to share the S3 data.
We have hundreds of petabytes of data and tens of thousands of Apache Hive tables. Our Presto clusters are comprised of a fleet of 450 r4.8xl EC2 instances. Presto clusters together have over 100 TBs of memory and 14K vcpu cores. Within Pinterest, we have close to more than 1,000 monthly active users (out of total 1,600+ Pinterest employees) using Presto, who run about 400K queries on these clusters per month.
Each query submitted to Presto cluster is logged to a Kafka topic via Singer. Singer is a logging agent built at Pinterest and we talked about it in a previous post. Each query is logged when it is submitted and when it finishes. When a Presto cluster crashes, we will have query submitted events without corresponding query finished events. These events enable us to capture the effect of cluster crashes over time.
Each Presto cluster at Pinterest has workers on a mix of dedicated AWS EC2 instances and Kubernetes pods. Kubernetes platform provides us with the capability to add and remove workers from a Presto cluster very quickly. The best-case latency on bringing up a new worker on Kubernetes is less than a minute. However, when the Kubernetes cluster itself is out of resources and needs to scale up, it can take up to ten minutes. Some other advantages of deploying on Kubernetes platform is that our Presto deployment becomes agnostic of cloud vendor, instance types, OS, etc.
#BigData #AWS #DataScience #DataEngineering
The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. We store data in an Amazon S3 based data warehouse. Apache Spark on Yarn is our tool of choice for data movement and #ETL. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.
Beyond data movement and ETL, most #ML centric jobs (e.g. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see https://github.com/stitchfix/flotilla-os).
At Stitch Fix, algorithmic integrations are pervasive across the business. We have dozens of data products actively integrated systems. That requires serving layer that is robust, agile, flexible, and allows for self-service. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. This provides our data scientist a one-click method of getting from their algorithms to production. We then integrate those deployments into a service mesh, which allows us to A/B test various implementations in our product.
For more info:
- Our Algorithms Tour: https://algorithms-tour.stitchfix.com/
- Our blog: https://multithreaded.stitchfix.com/blog/
- Careers: https://multithreaded.stitchfix.com/careers/
#DataScience #DataStack #Data
- NoSQL and Hadoop4
- Free3
- Lightning speed and simplicity in face of data jungle3
- Well documented for fast install2
- SQL interface to multiple datasources1
- Nested Data support1
- Read Structured and unstructured data1
- V1.10 released - https://drill.apache.org/1
related Apache Drill posts
- Open-source60
- Fast and Flexible48
- Great for distributed SQL like applications8
- One platform for every big data problem8
- Easy to install and to use6
- Works well for most Datascience usecases3
- In memory Computation2
- Interactive Query2
- Machine learning libratimery, Streaming in real2
- Speed3
related Apache Spark posts
The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. We store data in an Amazon S3 based data warehouse. Apache Spark on Yarn is our tool of choice for data movement and #ETL. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.
Beyond data movement and ETL, most #ML centric jobs (e.g. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see https://github.com/stitchfix/flotilla-os).
At Stitch Fix, algorithmic integrations are pervasive across the business. We have dozens of data products actively integrated systems. That requires serving layer that is robust, agile, flexible, and allows for self-service. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. This provides our data scientist a one-click method of getting from their algorithms to production. We then integrate those deployments into a service mesh, which allows us to A/B test various implementations in our product.
For more info:
- Our Algorithms Tour: https://algorithms-tour.stitchfix.com/
- Our blog: https://multithreaded.stitchfix.com/blog/
- Careers: https://multithreaded.stitchfix.com/careers/
#DataScience #DataStack #Data
Why we built Marmaray, an open source generic data ingestion and dispersal framework and library for Apache Hadoop :
Built and designed by our Hadoop Platform team, Marmaray is a plug-in-based framework built on top of the Hadoop ecosystem. Users can add support to ingest data from any source and disperse to any sink leveraging the use of Apache Spark . The name, Marmaray, comes from a tunnel in Turkey connecting Europe and Asia. Similarly, we envisioned Marmaray within Uber as a pipeline connecting data from any source to any sink depending on customer preference:
https://eng.uber.com/marmaray-hadoop-ingestion-open-source/
(Direct GitHub repo: https://github.com/uber/marmaray Kafka Kafka Manager )
Node.js
- Npm1.4K
- Javascript1.3K
- Great libraries1.1K
- High-performance1K
- Open source802
- Great for apis485
- Asynchronous475
- Great community420
- Great for realtime apps390
- Great for command line utilities296
- Websockets82
- Node Modules82
- Uber Simple69
- Great modularity59
- Allows us to reuse code in the frontend58
- Easy to start42
- Great for Data Streaming35
- Realtime32
- Awesome28
- Non blocking IO25
- Can be used as a proxy18
- High performance, open source, scalable17
- Non-blocking and modular16
- Easy and Fun15
- Easy and powerful14
- Same lang as AngularJS13
- Future of BackEnd13
- Fullstack12
- Fast11
- Cross platform10
- Scalability10
- Simple9
- Mean Stack8
- Great for webapps7
- Easy concurrency7
- Typescript6
- React6
- Fast, simple code and async6
- Friendly6
- Great speed5
- Easy to use and fast and goes well with JSONdb's5
- Scalable5
- Its amazingly fast and scalable5
- Control everything5
- Fast development5
- Isomorphic coolness4
- Easy to use4
- It's fast4
- Great community3
- Scales, fast, simple, great community, npm, express3
- TypeScript Support3
- Sooper easy for the Backend connectivity3
- Not Python3
- One language, end-to-end3
- Easy3
- Easy to learn3
- Less boilerplate code3
- Performant and fast prototyping3
- Blazing fast3
- Event Driven2
- Lovely2
- Npm i ape-updating2
- Creat for apis1
- Node0
- Bound to a single CPU46
- New framework every day44
- Lots of terrible examples on the internet38
- Asynchronous programming is the worst31
- Callback23
- Javascript18
- Dependency based on GitHub11
- Dependency hell11
- Low computational power10
- Very very Slow7
- Can block whole server easily7
- Callback functions may not fire on expected sequence6
- Unneeded over complication3
- Unstable3
- Breaking updates3
- No standard approach2
- Bad transitive dependency management1
- Can't read server session1
related Node.js posts
When I joined NYT there was already broad dissatisfaction with the LAMP (Linux Apache HTTP Server MySQL PHP) Stack and the front end framework, in particular. So, I wasn't passing judgment on it. I mean, LAMP's fine, you can do good work in LAMP. It's a little dated at this point, but it's not ... I didn't want to rip it out for its own sake, but everyone else was like, "We don't like this, it's really inflexible." And I remember from being outside the company when that was called MIT FIVE when it had launched. And been observing it from the outside, and I was like, you guys took so long to do that and you did it so carefully, and yet you're not happy with your decisions. Why is that? That was more the impetus. If we're going to do this again, how are we going to do it in a way that we're gonna get a better result?
So we're moving quickly away from LAMP, I would say. So, right now, the new front end is React based and using Apollo. And we've been in a long, protracted, gradual rollout of the core experiences.
React is now talking to GraphQL as a primary API. There's a Node.js back end, to the front end, which is mainly for server-side rendering, as well.
Behind there, the main repository for the GraphQL server is a big table repository, that we call Bodega because it's a convenience store. And that reads off of a Kafka pipeline.
How Uber developed the open source, end-to-end distributed tracing Jaeger , now a CNCF project:
Distributed tracing is quickly becoming a must-have component in the tools that organizations use to monitor their complex, microservice-based architectures. At Uber, our open source distributed tracing system Jaeger saw large-scale internal adoption throughout 2016, integrated into hundreds of microservices and now recording thousands of traces every second.
Here is the story of how we got here, from investigating off-the-shelf solutions like Zipkin, to why we switched from pull to push architecture, and how distributed tracing will continue to evolve:
https://eng.uber.com/distributed-tracing/
(GitHub Pages : https://www.jaegertracing.io/, GitHub: https://github.com/jaegertracing/jaeger)
Bindings/Operator: Python Java Node.js Go C++ Kubernetes JavaScript OpenShift C# Apache Spark
- Rapid development660
- Open source480
- Great community415
- Easy to learn371
- Mvc270
- Beautiful code225
- Elegant217
- Free200
- Great packages197
- Great libraries186
- Restful74
- Comes with auth and crud admin panel73
- Powerful72
- Great documentation69
- Great for web64
- Python52
- Great orm39
- Great for api37
- All included28
- Fast25
- Web Apps23
- Clean21
- Used by top startups20
- Easy setup19
- Sexy17
- Convention over configuration14
- ORM14
- Allows for very rapid development with great libraries13
- The Django community12
- Great MVC and templating engine10
- King of backend world10
- Full stack8
- Batteries included7
- Its elegant and practical7
- Fast prototyping6
- Very quick to get something up and running6
- Cross-Platform6
- Have not found anything that it can't do6
- Mvt6
- Zero code burden to change databases5
- Easy to develop end to end AI Models5
- Easy Structure , useful inbuilt library5
- Map4
- Easy to change database manager4
- Easy4
- Great peformance4
- Many libraries4
- Python community4
- Modular4
- Easy to use4
- Just the right level of abstraction3
- Scaffold3
- Full-Text Search3
- Scalable1
- Node js1
- Rails0
- Fastapi0
- Underpowered templating26
- Autoreload restarts whole server22
- Underpowered ORM22
- URL dispatcher ignores HTTP method15
- Internal subcomponents coupling10
- Not nodejs8
- Configuration hell8
- Admin7
- Not as clean and nice documentation like Laravel5
- Python3
- Not typed3
- Bloated admin panel included3
- Overwhelming folder structure2
- InEffective Multithreading2
- Not type safe1
related Django posts
Simple controls over complex technologies, as we put it, wouldn't be possible without neat UIs for our user areas including start page, dashboard, settings, and docs.
Initially, there was Django. Back in 2011, considering our Python-centric approach, that was the best choice. Later, we realized we needed to iterate on our website more quickly. And this led us to detaching Django from our front end. That was when we decided to build an SPA.
For building user interfaces, we're currently using React as it provided the fastest rendering back when we were building our toolkit. It’s worth mentioning Uploadcare is not a front-end-focused SPA: we aren’t running at high levels of complexity. If it were, we’d go with Ember.js.
However, there's a chance we will shift to the faster Preact, with its motto of using as little code as possible, and because it makes more use of browser APIs. One of our future tasks for our front end is to configure our Webpack bundler to split up the code for different site sections. For styles, we use PostCSS along with its plugins such as cssnano which minifies all the code.
All that allows us to provide a great user experience and quickly implement changes where they are needed with as little code as possible.
Hey, so I developed a basic application with Python. But to use it, you need a python interpreter. I want to add a GUI to make it more appealing. What should I choose to develop a GUI? I have very basic skills in front end development (CSS, JavaScript). I am fluent in python. I'm looking for a tool that is easy to use and doesn't require too much code knowledge. I have recently tried out Flask, but it is kinda complicated. Should I stick with it, move to Django, or is there another nice framework to use?
ASP.NET
- Great mvc20
- Easy to learn12
- C#5
- C#1
- Entity framework is very slow1
- Not highly flexible for advance Developers1
related ASP.NET posts
Finding the most effective dev stack for a solo developer. Over the past year, I've been looking at many tech stacks that would be 'best' for me, as a solo, indie, developer to deliver a desktop app (Windows & Mac) plus mobile - iOS mainly. Initially, Xamarin started to stand-out. Using .NET Core as the run-time, Xamarin as the native API provider and Xamarin Forms for the UI seemed to solve all issues. But, the cracks soon started to appear. Xamarin Forms is mobile only; the Windows incarnation is different. There is no Mac UI solution (you have to code it natively in Mac OS Storyboard. I was also worried how Xamarin Forms , if I was to use it, was going to cope, in future, with Apple's new SwiftUI and Google's new Fuchsia.
This plethora of techs for the UI-layer made me reach for the safer waters of using Web-techs for the UI. Lovely! Consistency everywhere (well, mostly). But that consistency evaporates when platform issues are addressed. There are so many web frameworks!
But, I made a simple decision. It's just me...I am clever, but there is no army of coders here. And I have big plans for a business app. How could just 1 developer go-on to deploy a decent app to Windows, iPhone, iPad & Mac OS? I remembered earlier days when I've used Microsoft's ASP.NET to scaffold - generate - loads of Code for a web-app that I needed for several charities that I worked with. What 'generators' exist that do a lot of the platform-specific rubbish, allow the necessary customisation of such platform integration and provide a decent UI?
I've placed my colours to the Quasar Framework mast. Oh dear, that means Electron desktop apps doesn't it? Well, Ive had enough of loads of Developers saying that "the menus won't look native" or "it uses too much RAM" and so on. I've been using non-native UI-wrapped apps for ages - the date picker in Outlook on iOS is way better than the native date-picker and I'd been using it for years without getting hot under the collar about it. Developers do get so hung-up on things that busy Users hardly notice; don't you think?. As to the RAM usage issue; that's a bit true. But Users only really notice when an app uses so much RAM that the machine starts to page-out. Electron contributes towards that horizon but does not cause it. My Users will be business-users after all. Somewhat decent machines.
Looking forward to all that lovely Vue.js around my TypeScript and all those really, really, b e a u t I f u l UI controls of Quasar Framework . Still not sure that 1 dev can deliver all that... but I'm up for trying...
I am looking for a new framework to learn and achieve more efficient development. I come mainly from Laravel, which greatly simplifies development, but is somewhat slow for the volumes of data that I usually handle (although very stable) and it falls far behind in terms of simultaneous connections.
I'm looking for something that responds well to high concurrency, adapts well to server resources (cores) without the need to be concerned about consciously multi-threading or similar things, has a good ORM and friendly integration with PostgreSQL, request validation, And of course, it is scalable.
The main use would be for API development and behind the scenes processing of large volumes of data (50M on average, although this goes hand in hand with the database and server capacity)..
The last framework I would include but couldn't is ASP.NET MVC.
- Clean architecture539
- Growing community384
- Composer friendly363
- Open source334
- The only framework to consider for php314
- Mvc216
- Quickly develop207
- Dependency injection165
- Application architecture154
- Embraces good community packages142
- Write less, do more70
- Orm (eloquent)65
- Restful routing64
- Database migrations & seeds54
- Artisan scaffolding and migrations52
- Awesome39
- Great documentation38
- Awsome, Powerfull, Fast and Rapid29
- Build Apps faster, easier and better27
- Promotes elegant coding26
- Modern PHP25
- Eloquent ORM25
- JSON friendly24
- Easy to learn, scalability23
- Blade Template22
- Beautiful22
- Most easy for me22
- Test-Driven21
- Based on SOLID15
- Security15
- Clean Documentation13
- Easy to attach Middleware13
- Cool13
- Convention over Configuration12
- Simple12
- Easy Request Validatin11
- Simpler10
- Easy to use10
- Fast10
- Get going quickly straight out of the box. BYOKDM9
- Its just wow9
- Friendly API8
- Laravel + Cassandra = Killer Framework8
- Simplistic , easy and faster8
- Less dependencies7
- Super easy and powerful7
- Great customer support6
- Its beautiful to code in6
- The only "cons" is wrong! No static method just Facades5
- Fast and Clarify framework5
- Active Record5
- Composer5
- Minimum system requirements5
- Laravel Mix5
- Eloquent5
- Php75
- Speed5
- Easy5
- Laragon4
- Laravel Forge and Envoy4
- Ease of use4
- Cashier with Braintree and Stripe4
- Laravel casher4
- Easy views handling and great ORM4
- Laravel Spark3
- Laravel Passport3
- Laravel Nova3
- Intuitive usage3
- Laravel Horizon and Telescope3
- Rapid development2
- Scout2
- Laravel Vite2
- Deployment2
- Succint sintax1
- Lovely1
- PHP48
- Too many dependency31
- Slower than the other two22
- A lot of static method calls for convenience17
- Too many include15
- Heavy12
- Bloated8
- Laravel7
- Confusing6
- Too underrated5
- Not fast with MongoDB3
- Difficult to learn1
- Not using SOLID principles1
related Laravel posts
I need to build a web application plus android and IOS apps for an enterprise, like an e-commerce portal. It will have intensive use of MySQL to display thousands (40-50k) of live product information in an interactive table (searchable, filterable), live delivery tracking. It has to be secure, as it will handle information on customers, sales, inventory. Here is the technology stack: Backend: Laravel 7 Frondend: Vue.js, React or AngularJS?
Need help deciding technology stack. Thanks.
Back at the start of 2017, we decided to create a web-based tool for the SEO OnPage analysis of our clients' websites. We had over 2.000 websites to analyze, so we had to perform thousands of requests to get every single page from those websites, process the information and save the big amounts of data somewhere.
Very soon we realized that the initial chosen script language and database, PHP, Laravel and MySQL, was not going to be able to cope efficiently with such a task.
By that time, we were doing some experiments for other projects with a language we had recently get to know, Go , so we decided to get a try and code the crawler using it. It was fantastic, we could process much more data with way less CPU power and in less time. By using the concurrency abilites that the language has to offers, we could also do more Http requests in less time.
Unfortunately, I have no comparison numbers to show about the performance differences between Go and PHP since the difference was so clear from the beginning and that we didn't feel the need to do further comparison tests nor document it. We just switched fully to Go.
There was still a problem: despite the big amount of Data we were generating, MySQL was performing very well, but as we were adding more and more features to the software and with those features more and more different type of data to save, it was a nightmare for the database architects to structure everything correctly on the database, so it was clear what we had to do next: switch to a NoSQL database. So we switched to MongoDB, and it was also fantastic: we were expending almost zero time in thinking how to structure the Database and the performance also seemed to be better, but again, I have no comparison numbers to show due to the lack of time.
We also decided to switch the website from PHP and Laravel to JavaScript and Node.js and ExpressJS since working with the JSON Data that we were saving now in the Database would be easier.
As of now, we don't only use the tool intern but we also opened it for everyone to use for free: https://tool-seo.com
Android SDK
- Android development286
- Necessary for android155
- Android studio127
- Mobile framework86
- Backed by google82
- Platform-tools27
- Eclipse + adt plugin21
- Powerful, simple, one stop environment5
- Free3
- Больно3
related Android SDK posts
We are using React Native in #SmartHome to share the business logic between Android and iOS team and approach users with a unique brand experience. The drawback is that we require lots of native Android SDK and Objective-C modules, so a good part of the invested time is there. The gain for a app that relies less on native communication, sensors and OS tools should be even higher.
Also it helps us set different testing stages: we use Travis CI for the javascript (business logic), Bitrise to run build tests and @Detox for #end2end automated user tests.
We use a microservices structure on top of Zeit's @now that read from firebase. We use JWT auth to authenticate requests among services and from users, following GitHub philosophy of using the same infrastructure than its API consumers. Firebase is used mainly as a key-value store between services and as a backup database for users. We also use its authentication mechanisms.
You can be super locked-in if you also rely on it's analytics, but we use Amplitude for that, which offers us great insights. Intercom for communications with end-user and Mailjet for marketing.
I've recently switched to using Expo for initializing and developing my React Native apps. Compared to React Native CLI, it's so much easier to get set up and going. Setting up and maintaining Android Studio, Android SDK, and virtual devices used to be such a headache. Thanks to Expo, I can now test my apps directly on my Android phone, just by installing the Expo app. I still use Xcode Simulator for iOS testing, since I don't have an iPhone, but that's easy anyway. The big win for me with Expo is ease of Android testing.
The Expo SDK also provides convenient features like Facebook login, MapView
, push notifications, and many others. https://docs.expo.io/versions/v31.0.0/sdk/