Need advice about which tool to choose?Ask the StackShare community!
For a property and casualty insurance company, we currently use MarkLogic and Hadoop for our raw data lake. Trying to figure out how snowflake fits in the picture. Does anybody have some good suggestions/best practices for when to use and what data to store in Mark logic versus Snowflake versus a hadoop or all three of these platforms redundant with one another?
for property and casualty insurance company we current Use marklogic and Hadoop for our raw data lake. Trying to figure out how snowflake fits in the picture. Does anybody have some good suggestions/best practices for when to use and what data to store in Mark logic versus snowflake versus a hadoop or all three of these platforms redundant with one another?
As i see it, you can use Snowflake as your data warehouse and marklogic as a data lake. You can add all your raw data to ML and curate it to a company data model to then supply this to Snowflake. You could try to implement the dw functionality on marklogic but it will just cost you alot of time. If you are using Aws version of Snowflake you can use ML spark connector to access the data. As an extra you can use the ML also as an Operational report system if you join it with a Reporting tool lie PowerBi. With extra apis you can also provide data to other systems with ML as source.
Hey, we want to build a referral campaign mechanism that will probably contain millions of records within the next few years. We want fast read access based on IDs or some indexes, and isolation is crucial as some listeners will try to update the same document at the same time. What's your suggestion between Couchbase and MongoDB? Thanks!
I am biased (work for Scylla) but it sounds like a KV/wide column would be better in this use case. Document/schema free/lite DBs data stores are easier to get up and running on but are not as scalable (generally) as NoSQL flavors that require a more rigid data model like ScyllaDB. If your data volumes are going to be 10s of TB and transactions per sec 10s of 1000s (or more), look at Scylla. We have something called lightweight transactions (LWT) that can get you consistency.
I have found MongoDB highly consistent and highly available. It suits your needs. We usually trade off partion tolerance fot this. Having said that, I am little biased in recommendation as I haven't had much experience with couchbase on production.
I'm planning to build a freelance marketplace website, using tools like Next.js, Firebase Authentication, Node.js, but I need to know which type of database is suitable with performance and powerful features. I'm trying to figure out what the best stack is for this project. If anyone has advice please, I’d love to hear more details. Thanks.
Postgres and MySQL are very similar, but Mongo has differences in terms of storage type and the CAP theorem. For your requirement, I prefer Postgres (or MySQL) over MongoDB. Mongo gives you no schema which is not always good. on the other hand, it is more common in NodeJS community, so you may find more articles about Node-Mongo stuff. I suggest to stay with RDBMS if possible.
This is a little about experience. Postgresql is fine. You can use either the related table structure or the json table structure.
We have a ready-made engine for the online exchange and marketplace. To customize it, you only need to know sql. Connecting any database is not a problem. https://falconspace.site/list/solutions
For learning purposes, I am trying to design a dashboard that displays the total revenue from all connected webshops/marketplaces, displaying incoming orders, total orders, etc.
So I will need to get the data (using Node backend) from the Shopify and marketplace APIs, storing this in the database, and get the data from the back end.
My question is:
What kind of database should I use? Is MongoDB fine for storing this kind of data? Or should I go with a SQL database?
Postgres is a solid database with a promising background. In the relational side of database design, I see Postgres as an absolute; Now the arguments and conflicts come in when talking about NoSQL data types. The truth is jsonb in Postgres is efficient and gives a good performance and storage. In a comparison with MongoDB with the same resources (such as RAM and CPU) with better tools and community, I think you should go for Postgres and use jsonb for some of the data. All in all, don't use a NoSQL database just cause you have the data type matching this tech, have both SQL and NoSQL at the same time.
I have found MongoDB easier to work with. Postgres and SQL in general, in my experience, is harder to work with. While Postgres does provide data consistency, MongoDB provides flexibility. I've found the MongoDB ecosystem to be really great with a good community. I've worked with MongoDB in production and it's been great. I really like the aggregation system and using query operators such as $in, $pull, $push.
While my opinion may be unpopular, I have found MongoDB really great for relational data, using aggregations from a code perspective. In general, data types are also more flexible with MongoDB.
I will use PostgreSQL because you have more powerfull feature for data agregation and views (the raw data from shopify and others could be stored as is) and then use views to produce diff. kind of reports unless you wanna create those aggregations/views in nodejs code. HTH
I want to store the data retrieved from multiple APIs and perform some analytics on it. The data stored in DB will never/hardly change. First, I thought it would be better to retrieve the data and create table columns for them, but some data might have different columns than others. So I thought about storing the JSON response from API directly to the table and use it. So which database will be the better choice, PostgreSQL or MongoDB.
Hey Krunal, your requirement sounds pretty clear and specific to what you want to do with that data. My recommendation to you, would be to use MongoDB. Since schema-less IO is faster in MongoDB, your general speed of reading / writing from and to the database would be quick. Additionally, the aggregate framework is very powerful with large data so that is also something that you can use in computing your analytics.
I suggest you to go with MongoDB
, because it is schema-less, i.e., it permits you to easily manipulate the schema of a table. If you want to add a column, it can be done without much effort. Moreover, MongoDB
can deal with more types of data, since the latest is stored as key-value pair. I do not what kind of analysis you are going to do, but NoSQL
is not the best choice if you are going to use complex queries. In addition, if you are working with huge amount of data and you are interested in optimising the performance, I suggest you PostgreSQL
.
Since you are speaking about API and JSON, I guess that you may using Node JS
for fetching API. I suggest you to try Mongoose
, which facilitate the use of MongoDB
with Node JS
.
Looks like the use case is to store JSON data. mongoDB and Postgres differ in so many aspects like scaling and consistency. Postgres has excellent JSON support now with the power of SQL. MongoDB is good in handling schema less data. However in this case it seems these differences don’t matter that much. I’d recommend you go with what you are most comfortable with.
I don't have an unquestionable opinion regarding your use case. I only trend to pick the MongoDB since it is schemaless avoiding null columns that you not always know when it is used (it depends on the source of the data). The only drawback that I could consider is the query's complexity in MongoDB, sometimes it is a bit tricky, when compared to the traditional SQL queries.
This is largely a matter of opinion. I see that someone else responded and recommended MongoDB but since you are doing data analytics, I highly recommend you go with SQL. You're going to have a really hard time normalizing the data when you can't manipulate relationships and bulk edit with a nice update query.
I'm much more experienced with MySQL than any other database and I am having a hard time getting on board with noSQL entirely because it's really hard to query complex data with relationships using noSQL. I'm using Firestore with one of my apps and MongoDB with another app but they both use MySQL for the heavy lifting and then a document database for things like permissions, caching, etc.
It sounds like the type of problem you need to reverse engineer. I'm sure you can imagine what the data sets would look like if you use MongoDB or Postgres. I suspect that putting in a little bit more work up front will pay high dividends and productivity once the data is normalized.
Again - it's largely a matter of preference but I prefer SQL almost every time.
I need urgent advice from you all! I am making a web-based food ordering platform which includes 3 different ordering methods (Dine-in using QR code scanning + Take away + Home Delivery) and a table reservation system. We are using React for the front-end, and I need your advice if I should use NestJS or ExpressJS for the backend. And regarding the database, which database should I use, MongoDB or PostgreSQL? Which combination will be better? PS. We want to follow the microservice architecture as scalability, reliability, and usability are the most important Non Functional requirements. Expert advice is needed, please. A load of thanks in advance. Kind Regards, Miqdad
I can't speak for the NestJS vs ExpressJS discussion, but I can given a viewpoint on databases.
The main thing to consider around database choice, is what "shape" the data will be in, and the kind of read/write patterns you expect of that data. The blog example shows up so much for DBMS like MongoDB, because it showcases what NoSQL / document storage is very scalable and performant in: mostly isolated documents with a few views / ways to order them and filter them. In your case, I can imagine a number of "relations" already, which suggest a more traditional SQL solution would work well: You have restaurants, they have maybe a few menus (regular, gluten-free etc), with menu items in, which have different prices over time (25% discount on christmas food just after christmas, 50% off pizzas on wednesdays). Then there's a whole different set of "relations" for people ordering, like showing them past orders, which need to refer to the restaurant etc, and credit card transaction information for refunds etc. That to me suggests PostgreSQL, which will scale quite well if you database design is okay.
PostgreSQL also offers you some extensions, which are just amazing for your use-case. https://postgis.net/ for example will let you query for restaurants based on location, without the big cost that comes from constantly using something like Google Maps API to work out which restaurants are near to someone ordering. Partitioning and window functions will be great for your own use internally too, like answering questions of "What types of takeways perform the best for us, Italian, Mexican?" or in combination with PostGIS, answering questions like "What kind of takeways do we need to market to, to improve our selection?".
While these things can all be implemented in MongoDB, you tend to lose some of the convenience of ACID or have to deal with things like eventual consistency, which requires more thinking on the part of your engineers. PostgreSQL offers decent (if more complex) scalablity and redundancy solutions, and is honestly very well proven and plenty of documentation exists on optimising queries.
Hello, i build microservice systems using Angular And Spring (Java) so i can't help with with ur back end choice, BUT, i definitely advice you to use a Nosql database, thus MongoDB of course or even Cassandra if your looking for extreme scalability with zero point of failure. Anyway, Nosql if much more faster then Sql (in your case Postresql DB). All you wanna do with sql can also be done by nosql (not the opposite of course).I also advice you to use docker containers + kubernetes to orchestrate them, if you need scalability and replication, that way your app can support auto scalability (in case ur users number goes high). Best of luck
About PostgreSQL vs MongoDB: short answer. Both are great. Choose what you like the most. Only if you expect millions of users, I‘ll incline with MongoDB.
I need to add a DBMS to my stack, but I don't know which. I'm tempted to learn SQLite since it would be useful to me with its focus on local access without concurrency. However, doing so feels like I would be defeating the purpose of trying to expand my skill set since it seems like most enterprise applications have the opposite requirements.
To be able to apply what I learn to more projects, what should I try to learn? MySQL? PostgreSQL? Something else? Is there a comfortable middle ground between high applicability and ease of use?
You can easily start with SQlite. Really easy to startup since it doesn't require you to install any additional software since is self-contained. It has interfaces in almost any language and also GUIs. Start learning SQL basics and simpler data models and structures. There are many tutorials, also available in the official website. From there you will easily migrate to another database. MySQL could be next, sonce it's easier to learn at first and has more resources available. PostgreSQL is less widespread, more challenging and has the fewer resorces, but once you have some experience with MySQL is really easy to learn as well. All these technologies are really widespread and used accross the industry so you won't make a wrong decision with any of these.
A question you might want to think about is "What kind of experience do I want to gain, by using a DBMS?". If your aim is to have experience with SQL and any related libraries and frameworks for your language of choice (python, I think?), then it kind of doesn't matter too much which you pick so much. As others have said, SQLite would offer you the ability to very easily get started, and would give you a reasonably standard (if a little basic) SQL dialect to work with.
If your aim is actually to have a bit of "operational" experience, in terms of things like what command line tools might be available as standard for the DBMS, understanding how the DBMS handles multiple databases, when to use multiple schemas vs multiple databases, some basic privilege management etc. Then I would recommend PostgreSQL. SQLite's simplicity actually avoids most of these experiences, which is not helpful to you if that is what you hope to learn. MySQL has a few "quirks" to how it manages things like multiple databases, which may lead you to making less good decisions if you tried to take your experience over to different DBMS, especially in bigger enterprise roles. PostgreSQL is kind of a happy middle ground here, with the ability to start PostgreSQL servers via docker or docker-compose making the actual day-to-day management pretty easy, while still giving you experience of the kinds of considerations I have listed above.
At Vital Beats we make use of PostgreSQL, largely because it offers us a happy balance between good management and backup of data, and good standard command line tools, which is essential for us where we are deploying our solutions within Kubernetes / docker, and so more graphical tools are not always appropriate for us. PostgreSQL is also pretty universally supported in terms of language libraries and frameworks, without having to make compromises on how we want to store and layout our data.
MySQL's very popular, easy to install, is also available as a managed service across most popular cloud offerings. The support/default tooling (such as MySQL Query Workbench) certainly is a little more baked than what you'll find for Postgres.
I have a lot of data that's currently sitting in a MariaDB database, a lot of tables that weigh 200gb with indexes. Most of the large tables have a date column which is always filtered, but there are usually 4-6 additional columns that are filtered and used for statistics. I'm trying to figure out the best tool for storing and analyzing large amounts of data. Preferably self-hosted or a cheap solution. The current problem I'm running into is speed. Even with pretty good indexes, if I'm trying to load a large dataset, it's pretty slow.
Druid Could be an amazing solution for your use case, My understanding, and the assumption is you are looking to export your data from MariaDB for Analytical workload. It can be used for time series database as well as a data warehouse and can be scaled horizontally once your data increases. It's pretty easy to set up on any environment (Cloud, Kubernetes, or Self-hosted nix system). Some important features which make it a perfect solution for your use case. 1. It can do streaming ingestion (Kafka, Kinesis) as well as batch ingestion (Files from Local & Cloud Storage or Databases like MySQL, Postgres). In your case MariaDB (which has the same drivers to MySQL) 2. Columnar Database, So you can query just the fields which are required, and that runs your query faster automatically. 3. Druid intelligently partitions data based on time and time-based queries are significantly faster than traditional databases. 4. Scale up or down by just adding or removing servers, and Druid automatically rebalances. Fault-tolerant architecture routes around server failures 5. Gives ana amazing centralized UI to manage data sources, query, tasks.
Hello, I am developing a new project with an internal chat between users. Also, there are complex relationships between the other project entities but I wolud like to build something scalable and fast and right now I am designing the data model. What kind of database would you recommend me to manage all application data? relational like MySQL, no relational like MongoDB or a mixed one? Thank you
In MongoDB, a write operation is atomic on the level of a single document, so it's harder to deal with consistency without transactions.
MongoDB supports horizontal scaling through Sharding , distributing data across several machines and facilitating high throughput operations with large sets of data. ... Sharding allows you to add additional instances to increase capacity when required
If you are trying with "complex relationships", give a chance to learn ArangoDB and Graph databases. Its database structures allow doing this with faster and simpler queries. The database is not as strict as others and allows arbitrary data. The data model is really like a neural network and you will never need foreign keys tables anymore. In Udemy there is a free course about it to get started.
The most important question is where are you planning to host? On-premise, or in the cloud.
Particularly if you are planning to host in either AWS or Azure, then your first point of call should be the PaaS (Platform as a Service) databases supplied by these vendors, as you will find yourself requiring a lot less effort to support them, much easier Disaster Recovery options, and also, depending on how PAYG the database is that you use, potentially also much cheaper costs than having a dedicated database server.
Your question regards 'Relational or not' is obviously key, and you need to consider both your required data structure, as well as the ACID requirements of your application model, as well as the non-functional requirements in terms of scalability, resilience, whether you want security authorisation at the highest application tier, or right down to 'row' level in the database, etc. - however please don't fall into the trap of considering 'NoSQL' as being single category. MongoDB, with its document-store type solution is a very different model to key-value-pair stores (like AWS DynamoDB), or column stores (like AWS RedShift) or for more complex data relationships, Entity Graph Stores (like AWS Neptune), to stores designed for tokenisation and text search (ElasticSearch) etc.
Also critical in all this is how many items you believe you need to index by. RDBMS/SQL stores are great for having as many indexes as you want, other than the slow-down in write speed, whereas databases like Amazon DynamoDB provide blisteringly fast read/write performance, but are very limited on key indexing capabilities.
It feels like you have most experience with SQL/RDBMS technologies, so for the simplest learning curve, and if your application fits it, then I'd personally start by looking at AWS Aurora https://aws.amazon.com/rds/aurora/ .
FIrstly, it may help if you explain what you mean by "complex relationships between project entities". Secondly, you can build a fast and scalable solution using either. With that said however, the data sounds relational so I would recommend MySQL.
I think, Its depend of your project type and your skills. MySQL is good and simple for maintenance but MongoDB need more skills and knowledge. If you work on little project, use MySQL. For your project type, MySQL is enough after you can migrate with PostgreSQL
MongoDB's document-oriented paradigm is nicely suited to the results of our ML model. We felt that this compatibility offered some time savings on figuring out and implementing an extensive data formatting and processing system. MongoDB's flexible schemas schemas (due to it being non-relational) were also attractive as a source of additional agility for our development process. The MongoDB ecosystem also has great GUI tools to simplify testing.
At Pushnami we were looking at several alternative databases that would support following architectural requirements: - very quick prototyping for an unknown domain - ability to support large amounts of data - native ability to replicate and fail over - full stack approach for Node.js development After careful consideration MongoDB came on top, and 3 years later we are still very happy with that decision. Currently we keep almost 2TB of data in our cluster, and start thinking about sharding.
After using couchbase for over 4 years, we migrated to MongoDB and that was the best decision ever! I'm very disappointed with Couchbase's technical performance. Even though we received enterprise support and were a listed Couchbase Partner, the experience was horrible. With every contact, the sales team was trying to get me on a $7k+ license for access to features all other open source NoSQL databases get for free.
Here's why you should not use Couchbase
Full-text search Queries The full-text search often returns a different number of results if you run the same query multiple types
N1QL queries Configuring the indexes correctly is next to impossible. It's poorly documented and nobody seems to know what to do, even the Couchbase support engineers have no clue what they are doing.
Community support I posted several problems on the forum and I never once received a useful answer
Enterprise support It's very expensive. $7k+. The team constantly tried to get me to buy even though the community edition wasn't working great
Autonomous Operator It's actually just a poorly configured Kubernetes role that no matter what I did, I couldn't get it to work. The support team was useless. Same lack of documentation. If you do get it to work, you need 6 servers at least to meet their minimum requirements.
Couchbase cloud Typical for Couchbase, the user experience is awful and I could never get it to work.
Minimum requirements
The minimum requirements in production are 6 servers. On AWS the calculated monthly cost would be ~$600
. We achieved better performance using a $16
MongoDB instance on the Mongo Atlas Cloud
writing queries is a nightmare While N1QL is similar to SQL and it's easier to write because of the familiarity, that isn't entirely true. The "smart index" that Couchbase advertises is not smart at all. Creating an index with 5 fields, and only using 4 of them won't result in Couchbase using the same index, so you have to create a new one.
Couchbase UI
The UI that comes with every database deployment is full of bugs, barely functional and the developer experience is poor. When I asked Couchbase about it, they basically said they don't care because real developers use SQL directly from code
Consumes too much RAM
Couchbase is shipped with a smaller Memcached instance to handle the in-memory cache. Memcached ends up using 8 GB of RAM for 5000 documents
! I'm not kidding! We had less than 5000 docs on a Couchbase instance and less than 20 indexes and RAM consumption was always over 8 GB
Memory allocations are useless I asked the Couchbase team a question: If a bucket has 1 GB allocated, what happens when I have more than 1GB stored? Does it overflow? Does it cache somewhere? Do I get an error? I always received the same answer: If you buy the Couchbase enterprise then we can guide you.
We actually use both Mongo and SQL databases in production. Mongo excels in both speed and developer friendliness when it comes to geospatial data and queries on the geospatial data, but we also like ACID compliance hence most of our other data (except on-site logs) are stored in a SQL Database (MariaDB for now)
MySQL has a lot of strengths working for it. It's simple and easy to set up and use. It's JSON engine is also really good these days. Mongo is also simple to setup and use, and it's speed as a document-object storage engine is first class.
Where Postgres has both beat is in it's combining of all of the features that make both MySQL and Mongo great, while adding on enterprise grade level scalability and replication. It's Postgres' stability and robustness, while still fulfilling the roles of it's contemporaries extremely well that edge Postgre for me.
When I was new with web development, I was using PHP for backend and MySQL for database. But after improving my JS skills, I chosen Node.js. Because of too many reasons including npm, express, community, fast coding and etc. MongoDB is so good for using with Node.js. If your JS skills are enough good, I recommend to migrate to Node.js and MongoDB.
My data was inherently hierarchical, but there was not enough content in each level of the hierarchy to justify a relational DB (SQL) with a one-to-many approach. It was also far easier to share data between the frontend (Angular), backend (Node.js) and DB (MongoDB) as they all pass around JSON natively. This allowed me to skip the translation layer from relational to hierarchical. You do need to think about correct indexes in MongoDB, and make sure the objects have finite size. For instance, an object in your DB shouldn't have a property which is an array that grows over time, without limit. In addition, I did use MySQL for other types of data, such as a catalog of products which (a) has a lot of data, (b) flat and not hierarchical, (c) needed very fast queries.
We used Mongo for the first iterations of our app, but the relational nature of our data was an awkward fit for a database that is not relational. We sorely lacked relational database integrity features that needed to be done on the application side (poorly) and it was a huge relief when we managed to port our application over to Postgres, which performs great and never gives us trouble, while having very user friendly extensions like JSON and PubSub that made the transition easy.
PostgreSQL is enterprise level database with transactions, full-text indexes, vector indexes, JSON, BLOB, geo-spatial data and a lot more. Highly scalable, configurable and easily maintainable. all that on an open source RDBMS database and you are still looking for GPL licensed MySQL with limited features? Look again.
We wanted a JSON datastore that could save the state of our bioinformatics visualizations without destructive normalization. As a leading NoSQL data storage technology, MongoDB has been a perfect fit for our needs. Plus it's open source, and has an enterprise SLA scale-out path, with support of hosted solutions like Atlas. Mongo has been an absolute champ. So much so that SQL and Oracle have begun shipping JSON column types as a new feature for their databases. And when Fast Healthcare Interoperability Resources (FHIR) announced support for JSON, we basically had our FHIR datalake technology.
Pros of Hadoop
- Great ecosystem39
- One stack to rule them all11
- Great load balancer4
- Amazon aws1
- Java syntax1
Pros of MongoDB
- Document-oriented storage828
- No sql593
- Ease of use553
- Fast464
- High performance410
- Free255
- Open source218
- Flexible180
- Replication & high availability145
- Easy to maintain112
- Querying42
- Easy scalability39
- Auto-sharding38
- High availability37
- Map/reduce31
- Document database27
- Easy setup25
- Full index support25
- Reliable16
- Fast in-place updates15
- Agile programming, flexible, fast14
- No database migrations12
- Easy integration with Node.Js8
- Enterprise8
- Enterprise Support6
- Great NoSQL DB5
- Support for many languages through different drivers4
- Schemaless3
- Aggregation Framework3
- Drivers support is good3
- Fast2
- Managed service2
- Easy to Scale2
- Awesome2
- Consistent2
- Good GUI1
- Acid Compliant1
Pros of MySQL
- Sql800
- Free679
- Easy562
- Widely used528
- Open source490
- High availability180
- Cross-platform support160
- Great community104
- Secure79
- Full-text indexing and searching75
- Fast, open, available26
- Reliable16
- SSL support16
- Robust15
- Enterprise Version9
- Easy to set up on all platforms7
- NoSQL access to JSON data type3
- Relational database1
- Easy, light, scalable1
- Sequel Pro (best SQL GUI)1
- Replica Support1
Sign up to add or upvote prosMake informed product decisions
Cons of Hadoop
Cons of MongoDB
- Very slowly for connected models that require joins6
- Not acid compliant3
- Proprietary query language2
Cons of MySQL
- Owned by a company with their own agenda16
- Can't roll back schema changes3