Apache Parquet logo

Apache Parquet

A free and open-source column-oriented data storage format
18
6
+ 1
0

What is Apache Parquet?

It is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.
Apache Parquet is a tool in the Big Data Tools category of a tech stack.
Apache Parquet is an open source tool with 953 GitHub stars and 833 GitHub forks. Here’s a link to Apache Parquet's open source repository on GitHub

Who uses Apache Parquet?

Companies
9 companies reportedly use Apache Parquet in their tech stacks, including Plista GmbH, Grandata, and Yotpo.

Developers
8 developers on StackShare have stated that they use Apache Parquet.

Apache Parquet Integrations

Java, Hadoop, Apache Hive, Apache Impala, and Apache Thrift are some of the popular tools that integrate with Apache Parquet. Here's a list of all 6 tools that integrate with Apache Parquet.

Why developers like Apache Parquet?

Here’s a list of reasons why companies and developers use Apache Parquet
Top Reasons
Be the first to leave a pro

Apache Parquet's Features

  • Columnar storage format
  • Type-specific encoding
  • Pig integration
  • Cascading integration
  • Crunch integration
  • Apache Arrow integration
  • Apache Scrooge integration
  • Adaptive dictionary encoding
  • Predicate pushdown
  • Column stats

Apache Parquet Alternatives & Comparisons

What are some alternatives to Apache Parquet?
Avro
It is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format.
Apache Kudu
A new addition to the open source Apache Hadoop ecosystem, Kudu completes Hadoop's storage layer to enable fast analytics on fast data.
MySQL
The MySQL software delivers a very fast, multi-threaded, multi-user, and robust SQL (Structured Query Language) database server. MySQL Server is intended for mission-critical, heavy-load production systems as well as for embedding into mass-deployed software.
PostgreSQL
PostgreSQL is an advanced object-relational database management system that supports an extended subset of the SQL standard, including transactions, foreign keys, subqueries, triggers, user-defined types and functions.
MongoDB
MongoDB stores data in JSON-like documents that can vary in structure, offering a dynamic, flexible schema. MongoDB was also designed for high availability and scalability, with built-in replication and auto-sharding.
See all alternatives

Apache Parquet's Followers
6 developers follow Apache Parquet to keep up with related blogs and decisions.
Dmitri Fomin
Pradeep Gupta
Daniel Sobrado
Justin Dorfman
Mehdi TAZI
I Like My Privacy