Need advice about which tool to choose?Ask the StackShare community!

Apache Parquet

91
186
+ 1
0
SQLite

18.5K
14.6K
+ 1
535
Add tool

Apache Parquet vs SQLite: What are the differences?

Apache Parquet vs SQLite

Apache Parquet and SQLite are both widely used technologies in the field of data storage and processing. However, there are several key differences between them that make them more suitable for specific use cases.

  1. Data Structure: One major difference between Apache Parquet and SQLite is in the way they store data. Parquet is a columnar storage file format that works well for large-scale analytical workloads. It optimizes data for query performance by organizing data into columns rather than rows. On the other hand, SQLite is a relational database management system that stores data in a table format using rows and columns.

  2. Data Storage: Another difference between Parquet and SQLite is the way they store data on disk. Parquet files are stored as binary files with a nested structure that allows for efficient compression and encoding. This makes Parquet highly efficient for storing and querying large datasets. SQLite, on the other hand, stores data in a single file format that is self-contained and portable.

  3. Query Performance: Parquet and SQLite also differ in terms of query performance. Parquet's columnar storage format allows for efficient predicate pushdown and column pruning, enabling faster query execution. Additionally, Parquet's compression techniques further enhance query performance by reducing the amount of data that needs to be read from disk. SQLite, on the other hand, offers efficient indexing and query optimization techniques that provide fast query execution.

  4. Concurrency and Scalability: When it comes to handling concurrent access and scalability, Parquet and SQLite have different capabilities. Parquet is designed to be read-heavy and is well-suited for parallel processing and big data analytics. It supports parallel reads and can scale horizontally across multiple nodes. SQLite, on the other hand, excels in single-user scenarios and is not recommended for high-concurrency applications or large-scale distributed systems.

  5. Data Types and SQL Support: Parquet supports a limited set of data types compared to SQLite, which supports a wide range of data types including built-in support for spatial, text, and date/time data. SQLite also provides comprehensive SQL support for various operations like joins, subqueries, and aggregations. Parquet, on the other hand, is primarily focused on providing efficient storage and query capabilities for analytical workloads.

  6. Deployment and Integration: Parquet and SQLite also differ in terms of deployment and integration. Parquet is commonly used in big data processing frameworks like Apache Spark and Apache Hadoop, where it seamlessly integrates with other tools and libraries in the ecosystem. SQLite, on the other hand, is typically used as an embedded database within applications and does not require any separate deployment or installation.

In summary, Apache Parquet and SQLite differ in terms of their data structure, storage format, query performance, concurrency, data types, and deployment options. These differences make them more suitable for specific use cases, with Parquet being ideal for large-scale analytical workloads and SQLite being well-suited for single-user scenarios and embedded database applications.

Advice on Apache Parquet and SQLite
Needs advice
on
FirebaseFirebaseMySQLMySQL
and
SQLiteSQLite

Hi everyone! I am a high school student, starting a massive project. I'm building a system for a boarding school to be better connected to their students and be more efficient with information. In the meantime, I am developing a website and an android app. What's the best datastore I can use? I need to be able to access student data on the app from the main database and send push notifications. Also feed updates. What's the best approach? What's the best tool I can use to deploy the website and the database? One for testing and prototyping, and an official one... Thanks in advance!!!!

See more
Replies (3)
Ahmed AlAskalany
Android Developer at Kitab Sawti · | 5 upvotes · 303.9K views
Recommends
on
FirebaseFirebase

Firebase has Android, iOS, and Web SDKs; and a console where you can develop, manage, and monitor all the data and analytics from one place. Firebase real-time database is good for online presence and instant feed updates, while Firebase Firestone is good for user profile and other relational data records. Firebase has a UI SDK which makes it easy to interface with the resources in the project, and with tons of tutorials and starter projects it should be easy to quickly have a decent prototype to iterate upon. Since you said Massive, use their pricing calculator to figure if your expected scale will be covered by the free quota or if you go for the pay-as-you-go that the price is reasonable for your project.

Good luck with the project!

See more
Paul Whittemore
Developer and Owner at Appurist Software · | 4 upvotes · 304K views
Recommends
on
FirebaseFirebase

It sounds like a server-client relationship (central database) and while SQLite is probably the simplest, note that its performance is probably the worst of the top 20 or so choices you have. It is different from Firebase and MySQL (and most other databases) in that it is embedded in the product, although it could be embedded in your server itself.

MySQL would require a separate MySQL db server, which means either two servers (one for MySQL, and one to provide your specific services to your client app) or both running on a single server machine. There are many alternatives in the same category as MySQL, and a choice of relational databases or document (NoSQL) databases. But architecturally, they are in the same category as MySQL, a separate db server that your application server would get its data from.

Firebase is different yet again, in that it is a service that is already hosted by a company, providing many integrated features such as authentication and storage of user account info. However it does take care of many of the concerns with running a server, such as performance, scalability and management. There are some negatives that you should be aware of though: any investment of time and coding with Firebase is pretty much non-portable, in that you are stuck with Firebase going forward. If you needed to switch to a different service, not only would it be a different API, but it would be a different architecture and much of your coding would need to be discarded. Second, it's owned and run by Google now, so you have a large corporation backing it, but that also means they could decide to discontinue it without any real effect on the Google bottom line. Also some folks would have concerns with storing data on Google servers. That said, I think if you are aware of these in advance, and especially if you are a high school student, that Firebase is a fairly easy winner here. The server is already set up for you, the documentation is very complete and rich, with lots of examples, and Google is not going away. The main concern would be if it really is massive, there could be a rising cost to the service. I suspect though that it is not massive, even if everyone in a school used it. The number of concurrent connections would not be huge (probably not even into the hundreds, even if there are thousands of users).

I'd go with Firebase even though you will need to learn their API, because you'll need to learn something one way or another. SQLite is a bit of a toy database, and MySQL is a real one but you (or someone) would need to manage that server on top of needing to develop the server and client app. With Firebase, much of the server already exists, including a professionally hosted database. There are tons of high-level features provided and initial cost is somewhere between very low and zero.

Part of this is dependent on what language you want to write this in. Javascript for a cross-platform client app (I'd use Vue.js + Vuetify for UI, and provide it as a web app and optionally wrap that with Electron for a desktop app, Apache Cordova for mobile). Server could be Javascript with an Express-based REST API on Node.js, talking to Firebase for services.

If you were a Java developer though, all this goes out the window and I'd recommend a simple Java server with Javalin for REST API, and embedded ObjectDB for database storage (combined into one server). ObjectDB is very very fast and can be separated out into a scalable server if this became truly massive. But you would probably never need to go that far.

All of this is a lot of work. I hope this isn't for something like an assignment. It is in the order of 6 months of work if you know what you're doing, all year if you're learning as you go.

See more
Michael Maraist
Chief Architect at Pixia Corp · | 2 upvotes · 303.4K views
Recommends
on
RocksDBRocksDB

Don't think you can go wrong with MySQL or postgresql. python+postgres is VERY well supported stack and can do almost anything. Great visualization and administrative tools for both. There are some data-mismatch problems, however.. node.js/python with mongodb is a bit more modern and makes it trivial to "serialize" data with sprinklings of indexes. If you're using go-lang, then RocksDB is a great high-performance data-modeling base (it's not relational how-ever) It's more like a building-block for key-value store. But it's ACID so you CAN build relational systems on top. I've used LevelDB for other projects (Java/C) (similar architecture and works great on android - chrome uses it for it's metadata-storage). Rock/Level can achieve multi-million writes on cheap hardware thanks to it's trade-offs.

I'm very familiar with SQLite.. Personally my least favorite, but it's the most portable database format, and it does support ACID.. I have many gripes, but biggest issue is parallel access (you really need a single process/thread to own the data-model, then use IPC to communicate with your process/thread).. (same could be said for LevelDB, but that's so efficient, it's almost never an issue).

If your'e using Java, then JavaDB/DerbyDB/HSQLDB are EXCELLENT systems.. highly multi-threaded, good stand-alone tools. (embedded or TCP-connected). Perfect for unit-tests. Can use simple dumb portable formats (e.g. text-file containing only inserts) all the way to classic journaled binary B-tree formats to pure-in-memory. Java has a lot of overhead, so this is only really viable if you're already using Java in your project.

For high performance "memsql" is mysql API to a hybrid in-memory index + on-disk column-database (feels like classic SQL to you though). Falls into the mysql-swiss-army-knife tool-kit.

Similarly with in-memory there is "redis".. Absolutely a joy to work with. It too is a specialty swiss army knife. Steer clear of redis for primary data that you can't lose.. while redis does support persisting data, it isn't very efficient and will become the bottleneck. redis is great for micro-queue's, topics, stat-aggregators, message-repositories (password-management systems, where writes are rare so persistance is viable). Plus I love that redis uses a pure-text protocol so I can netcat or telnet directly into it and do stuff.

I've loved cloud-data-stores.. Amazon "DynamoDB" or Google BigTable are awesome!!! Cheap compared to normal hosting fees of an AWS EC2 instance.. You can play all day.. put a terabyte up, then blow it away.. pay for what you play with. It's a very very different data-model though.. They give you a very very few set of tricks that let you do complex data-modeling - and you have to be clever and have enough foresight to not block yourself into a hole (or have customer abuse expensive queries).

Then there's Cassandra/Hadoop (HBase). These are petabyte scale databases (technically so is Dynamo/BigTable). They're incredibly efficient at what they do. And they have a lot of plugins to do almost anything you need. I personally love these the best (and RocksDB/LevelDB are like their infant children offspring). You can run these on your laptop (unlike Amazon/Google engines above). But their discipline is very different than all the other's above.

See more
Get Advice from developers at your company using StackShare Enterprise. Sign up for StackShare Enterprise.
Learn More
Pros of Apache Parquet
Pros of SQLite
    Be the first to leave a pro
    • 163
      Lightweight
    • 135
      Portable
    • 122
      Simple
    • 81
      Sql
    • 29
      Preinstalled on iOS and Android
    • 2
      Free
    • 2
      Tcl integration
    • 1
      Portable A database on my USB 'love it'

    Sign up to add or upvote prosMake informed product decisions

    Cons of Apache Parquet
    Cons of SQLite
      Be the first to leave a con
      • 2
        Not for multi-process of multithreaded apps
      • 1
        Needs different binaries for each platform

      Sign up to add or upvote consMake informed product decisions

      No Stats
      - No public GitHub repository available -

      What is Apache Parquet?

      It is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.

      What is SQLite?

      SQLite is an embedded SQL database engine. Unlike most other SQL databases, SQLite does not have a separate server process. SQLite reads and writes directly to ordinary disk files. A complete SQL database with multiple tables, indices, triggers, and views, is contained in a single disk file.

      Need advice about which tool to choose?Ask the StackShare community!

      What companies use Apache Parquet?
      What companies use SQLite?
      See which teams inside your own company are using Apache Parquet or SQLite.
      Sign up for StackShare EnterpriseLearn More

      Sign up to get full access to all the companiesMake informed product decisions

      What tools integrate with Apache Parquet?
      What tools integrate with SQLite?

      Sign up to get full access to all the tool integrationsMake informed product decisions

      Blog Posts

      Aug 28 2019 at 3:10AM

      Segment

      PythonJavaAmazon S3+16
      7
      2555
      What are some alternatives to Apache Parquet and SQLite?
      Avro
      It is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format.
      Apache Kudu
      A new addition to the open source Apache Hadoop ecosystem, Kudu completes Hadoop's storage layer to enable fast analytics on fast data.
      JSON
      JavaScript Object Notation is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language.
      Cassandra
      Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster. Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.
      HBase
      Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Apache Hadoop.
      See all alternatives