Azure Cosmos DB

Azure Cosmos DB

Application and Data / Data Stores / NoSQL Database as a Service

We are building cloud based analytical app and most of the data for UI is supplied from SQL server to Delta lake and then from Delta Lake to Azure Cosmos DB as JSON using Databricks. So that API can send it to front-end. Sometimes we get larger documents while transforming table rows into JSONs and it exceeds 2mb limit of cosmos size. What is the best solution for replacing Cosmos DB?

READ MORE
4 upvotes·35.8K views
Replies (2)
CTO at BT Créditos·

You could probably use CosmosDB to store metadata and then store your big documents in a Storage Account Blob Container. Then, you store the link for the documents in CosmosDB. It's a cheap way of solving this without leaving Azure.

READ MORE
4 upvotes·1 comment·6.3K views
Arjun R
Arjun R
·
June 6th 2022 at 9:39AM

Thanks for the input Ivan Reche. If we store big documents to blob container then how will python API's can query those and send it to UI? and if any updates happen on UI, then API has to write those changes back to big documents as copy.

·
Reply
CTO at Estimator360 Inc·

Do you know what the max size of one of your documents might be? Mongo (which you can also use on Azure) allows for larger sized documents (I think maybe 20MB). With that said, I ran into this issue when I was first using Cosmos, and I wound up rethinking the way I was storing documents. I don't know if this is an option for your scenario, but I ended up doing was breaking my documents up into smaller subdocuments. A thought process that I have come to follow is that if any property is an array (or at least can be an array with a length of N), make that array simple a list of IDs that point to other documents.

READ MORE
2 upvotes·1 comment·5K views
Dan Trigwell
Dan Trigwell
·
August 16th 2023 at 7:59AM

Aerospike might be one to check out. Can store 8Mb objects and provides much better performance and cost effectiveness compared with Cosmos and Mongo.

·
Reply
Needs advice
on
Azure Cosmos DBAzure Cosmos DBNeo4jNeo4j
and
OrientDBOrientDB

We have an in-house build experiment management system. We produce samples as input to the next step, which then could produce 1 sample(1-1) and many samples (1 - many). There are many steps like this. So far, we are tracking genealogy (limited tracking) in the MySQL database, which is becoming hard to trace back to the original material or sample(I can give more details if required). So, we are considering a Graph database. I am requesting advice from the experts.

  1. Is a graph database the right choice, or can we manage with RDBMS?
  2. If RDBMS, which RDMS, which feature, or which approach could make this manageable or sustainable
  3. If Graph database(Neo4j, OrientDB, Azure Cosmos DB, Amazon Neptune, ArangoDB), which one is good, and what are the best practices?

I am sorry that this might be a loaded question.

READ MORE
7 upvotes·216.5K views
Replies (1)
Recommends
on
ArangoDB

You have not given much detail about the data generated, the depth of such a graph, and the access patterns (queries). However, it is very easy to track all samples and materials if you traverse this graph using a graph database. Here you can use any of the databases mentioned. OrientDB and ArangoDB are also multi-model databases where you can still query the data in a relational way using joins - you retain full flexibility.

In SQL, you can use Common Table Expressions (CTEs) and use them to write a recursive query that reads all parent nodes of a tree.

I would recommend ArangoDB if your samples also have disparate or nested attributes so that the document model (JSON) fits, and you have many complex graph queries that should be performed as efficiently as possible. If not - stay with an RDBMS.

READ MORE
5 upvotes·2 comments·14.8K views
Michael Staub
Michael Staub
·
August 6th 2020 at 4:53PM

Another reason I recommend ArangoDB is the fact that the storage engine does not limit your data model. You cannot create a geo-index on a 'user.location' field in any of the gremlin-compatible stores for example, as the JSON documents can only have one level of properties.

·
Reply
Thiru Medampalli
Thiru Medampalli
·
August 7th 2020 at 9:00PM

Hey @ifcologne,

Thanks for your response, We woud explore the ArangoDB <

Here are some more details if you are wondering

Operation produces many samples(output) from other samples(input). We are traking both Operation and Samples (two graphs i.e one for operation and another for samples), Typical depth is 10 to 20 for both Operation and Samples but some are even deeper(> 20). Operations could be million records(2-3 million) and samples could be (10 to 20 million) records so far over the years. We are using the Closure data model in the dbms to represent the tree/graph data.

Access patern:

API and some power users directly access the data via specific sql(stored procedure and/or special sql sripts). We are open to restrict or enhance the acess pattens further.

We are finding it hard to go upstream/downstream and also merge two tree structures(operations and samples) as depth increaseses

We are finding hard to data mine based on sample or process attributes(some are nesed)

Hard to represent multiple parents to one child.

·
Reply
Architect-IoT at A. P. Moller Maerks·
Needs advice
on
Azure Cosmos DBAzure Cosmos DB
and
MongoDBMongoDB
in

I am currently using Azure Cosmos DB for our IoT platform and am planning to switch to another #NoSQL database for cost and other related issues. I am also looking for a database that has higher capabilities towards reporting solutions through Power BI or other reporting tools.

READ MORE
3 upvotes·27.6K views
Replies (1)
Developer at Listatree·
Recommends
on
MongoDB

MongoDB is a good database, we are using MongoDB at Talenteca.com and it's very good. You may also use PostgreSQL a fine database with solid performance.

READ MORE
1 upvote·230 views

Need thoughts of which services to use either Amazon DynamoDB or Azure Cosmos DB. I'm more interested in performance comparision between these tools

READ MORE
1 upvote·35.8K views