Presto

Presto

Application and Data / Data Stores / Big Data Tools
Needs advice
on
PostgreSQLPostgreSQL
and
PrestoPresto

Hello experts,

I am trying to get a comprehensive list of the differences in syntax between Presto and PostgreSQL.

Is there an official documentation where I can find it? Is one a subset of another (as in all commands of presto are in Postgres or vice versa)?

READ MORE
3 upvotes·2.8K views
Replies (1)
Senior Software Engineer ·
Recommends
on
Presto

This is a bit difficult to answer since Postgres is a database and Presto is a query engine, so Presto can be used to query a Postgres database (as well as many other data sources). However if you are strictly looking to execute an identical SQL statement via both Postgres and Presto, I suggest evaluating the SQL statement in terms of ANSI compliance. Presto is fully ANSI-compliant, so any ANSI SQL statement will execute using Presto. Postgres is mostly ANSI-compliant, so an ANSI SQL statement that is within the conformance specified by Postgres will be executable by both Presto and Postgres. But a Postgres query using non-ANSI SQL may not be executable by Presto, and a Presto query that uses one of the ANSI standards not supported by Postgres may not be executable by Postgres. I also recommend reading about Presto's query execution model and in particular the difference between statements (i.e. raw SQL text) and queries. Because of the way Presto converts statements to queries, even though you can generally copy a SQL statement from Postgres to Presto, optimizing that query for Presto may require significant changes in syntax. If you want to maximize compatibility, using Presto as your query engine will allow you to query your data whether you choose to store it in Postgres or elsewhere.

READ MORE
4 upvotes·11 views
Solution Architect at Oracle Financial Software Services·
Needs advice
on
DenodoDenodo
and
PrestoPresto
in

We are evaluating Presto against the Denodo to build the virtualization layer on top of the Cloudera Data warehouse. We have customer and transaction data in the Cloudera data warehouse, and we want to build the virtualization layer on top of the multiple datasets and Cloudera DW.

READ MORE
3 upvotes·22.3K views
Replies (2)
Founder at Ahana·
Recommends
on
Presto

Hi Manish

Good question. If you are looking for an engine for on prem data lakes like HDFS or Cloud data lakes like S3 and GCS, Presto would be the right choice. That is the key reason Facebook created Presto - it replaced Hive. In addition, it is built for open formats like Apache Parquet and Apache ORC. In fact Presto is becoming the de facto engine for data lakes. Like Facebook, many users are migrating from Hive and other Hadoop era engines to Presto. Presto is more than a virtualization layer, it is built like a database engine, has an optimizer and can push down predicates to the object / files underneath limiting reads. In addition, Presto is open source under the Linux Foundation and you can participate in its evolution. This short article may help a bit : https://ahana.io/answers/how-do-i-query-a-data-lake-with-presto/

Compared with this Denodo was built for traditional relational databases not for data lakes. If you are federating across sql server, oracle etc Denodo would be a good choice. Good tutorial here: https://community.denodo.com/tutorials/browse/bi/2virtualization1

To come clean, I may be biased towards Presto as a leader of the Presto Foundation and founder of a Presto company, but I have been building database and distributed systems for over 15 years and try to help users make the right decisions about their data problems they are trying to solve. Hope this helps.

cheers Dipti

READ MORE
7 upvotes·2 comments·202 views
Manish Bhoge
Manish Bhoge
·
July 7th 2021 at 6:20PM

Thank you @Dipti

·
Reply
Dipti Borkar
Dipti Borkar
·
July 8th 2021 at 12:26AM

You got it :) Manish

·
Reply
CEO at Tech Battalion·
Recommends
on
Denodo
Presto

I recommend Presto (more importantly Trino https://trino.io/ ). The creators of Presto moved from Facebook to Trino (forked the code) and Trino now looks like the thriving tool. See Starburst for Enterprise and compatibility with Cloudera: https://docs.starburst.io/358-e/connector/starburst-hive-cdp.html

As for Denodo, this is an enterprise silver bullet. Trino is the community silver bullet. Nothing I've ever seen performs as well as Trino for aggregated data querying over databases.

READ MORE
4 upvotes·173 views
Technical Lead at Incred Financial Solutions·
Needs advice
on
Amazon S3Amazon S3MetabaseMetabase
and
PrestoPresto

Hi,

We are currently storing the data in Amazon S3 using Apache Parquet format. We are using Presto to query the data from S3 and catalog it using AWS Glue catalog. We have Metabase sitting on top of Presto, where our reports are present. Currently, Presto is becoming too costly for us, and we are looking for alternatives for it but want to use the remaining setup (S3, Metabase) as much as possible. Please suggest alternative approaches.

READ MORE
6 upvotes·100.4K views
Replies (1)
Co-founder at Transloadit·

Hey there, the trick to keeping costs under control is to partition. This means you split up your source files by date, and also query within dates, so that Athena only scans the few files necessary for those dates. I hope that makes sense (and I also hope I understood your question right). This article explains better https://aws.amazon.com/blogs/big-data/analyze-your-amazon-cloudfront-access-logs-at-scale/.

READ MORE
Analyze your Amazon CloudFront access logs at scale | AWS Big Data Blog (aws.amazon.com)
4 upvotes·4.7K views
Tech Lead, Big Data Platform at Pinterest·

To provide employees with the critical need of interactive querying, we’ve worked with Presto, an open-source distributed SQL query engine, over the years. Operating Presto at Pinterest’s scale has involved resolving quite a few challenges like, supporting deeply nested and huge thrift schemas, slow/ bad worker detection and remediation, auto-scaling cluster, graceful cluster shutdown and impersonation support for ldap authenticator.

Our infrastructure is built on top of Amazon EC2 and we leverage Amazon S3 for storing our data. This separates compute and storage layers, and allows multiple compute clusters to share the S3 data.

We have hundreds of petabytes of data and tens of thousands of Apache Hive tables. Our Presto clusters are comprised of a fleet of 450 r4.8xl EC2 instances. Presto clusters together have over 100 TBs of memory and 14K vcpu cores. Within Pinterest, we have close to more than 1,000 monthly active users (out of total 1,600+ Pinterest employees) using Presto, who run about 400K queries on these clusters per month.

Each query submitted to Presto cluster is logged to a Kafka topic via Singer. Singer is a logging agent built at Pinterest and we talked about it in a previous post. Each query is logged when it is submitted and when it finishes. When a Presto cluster crashes, we will have query submitted events without corresponding query finished events. These events enable us to capture the effect of cluster crashes over time.

Each Presto cluster at Pinterest has workers on a mix of dedicated AWS EC2 instances and Kubernetes pods. Kubernetes platform provides us with the capability to add and remove workers from a Presto cluster very quickly. The best-case latency on bringing up a new worker on Kubernetes is less than a minute. However, when the Kubernetes cluster itself is out of resources and needs to scale up, it can take up to ten minutes. Some other advantages of deploying on Kubernetes platform is that our Presto deployment becomes agnostic of cloud vendor, instance types, OS, etc.

#BigData #AWS #DataScience #DataEngineering

READ MORE
Presto at Pinterest - Pinterest Engineering Blog - Medium (medium.com)
38 upvotes·1 comment·2.8M views
Kaibo Hao
Kaibo Hao
·
January 28th 2020 at 12:46AM

ECS on AWS will reduce your cost on EC2 and Kubernetes. Athena may be another tool for reducing your cost by replacing the Presto. It takes advantage of the S3 as the storage and provided the serverless management for your infrastructure.

·
Reply