We have an in-house build experiment management system. We produce samples as input to the next step, which then could produce 1 sample(1-1) and many samples (1 - many). There are many steps like this. So far, we are tracking genealogy (limited tracking) in the MySQL database, which is becoming hard to trace back to the original material or sample(I can give more details if required). So, we are considering a Graph database. I am requesting advice from the experts.
- Is a graph database the right choice, or can we manage with RDBMS?
- If RDBMS, which RDMS, which feature, or which approach could make this manageable or sustainable
- If Graph database(Neo4j, OrientDB, Azure Cosmos DB, Amazon Neptune, ArangoDB), which one is good, and what are the best practices?
I am sorry that this might be a loaded question.
You have not given much detail about the data generated, the depth of such a graph, and the access patterns (queries). However, it is very easy to track all samples and materials if you traverse this graph using a graph database. Here you can use any of the databases mentioned.
ArangoDB are also multi-model databases where you can still query the data in a relational way using joins - you retain full flexibility.
In SQL, you can use Common Table Expressions (CTEs) and use them to write a recursive query that reads all parent nodes of a tree.
I would recommend
ArangoDB if your samples also have disparate or nested attributes so that the document model (JSON) fits, and you have many complex graph queries that should be performed as efficiently as possible. If not - stay with an RDBMS.