Amazon Neptune

Application and Data / Data Stores / Graph Database as a Service

Needs advice

and

Hey people!!!!! I am developing an application for which graph databases are perfect, but I am low on cash(0 actually), and I am wondering if there is any free service is available for Amazon Neptune or Neo4j or any other substitute is available for the two. As far as I checked, I couldn't find any free service.

3 upvotes·33.2K views

Replies (4)

Anthony Chiboucas

Software Engineer & Support Operations Lead ·Dec 16, 2020

Recommends

Neo4j

You can install and run Neo4j Community Edition for free locally, or on a linux server. It's GPLv3 open source, so you can use it commercially. There are some contraints to the community edition, like no clustering, and no more than 34 million nodes, but it'll be hard to need those features until you're big. If it turns out you do need Enterprise, they have a startup plan (I think it's free too) for companies with less than 50 employees: neo4j.com/startup-program.

What you're really choosing between is price, language, and support:

PRICE

Neptune: At least $1000 a year, though you'll likely be spending closer to $5000 a year when you're live.
Neo4j Community: Free for commercial use.
Neo4j Enterprise: (free?) for small companies. $35,000? one-time license fee for big companies.

Support

Neo4j: Large active community, both on their forums and in stackoverflow. It's easy to find and direct-message people building and working on the engine.
Amazon: ...well, it's Amazon.

Language

To compare the languages, I'll give a sample for getting a node by label "person" who's name property is "Bob", finding "children" nodes, and returning the email property from the children as a list.

Neptune Gremlin

g.V().hasLabel("person").has("name","Bob")
in("children").hasLabel("student").values("email")

Neptune SPARQL

I don't have a clue, it's so esoteric with no user-friendly documentation or guides.

Neo4j Cypher

MATCH ( :person {name: "Bob"})<-[:children]-(x :student)
RETURN x.email

5 upvotes·2 comments·1.4K views

Anthony Chiboucas

November 25th 2021 at 6:14PM

Typo in there. Community edition is limited to 34 BILLION nodes.

Jim Hill

April 7th 2021 at 4:29AM

You could also run the neo4j docker image on ECS fargate for pretty cheap if you don't need persistence during dev cycles.

Cindee Madison

Dec 8, 2020

Recommends

Neo4j

Neo4j has a startup tier for their Enterprise License. This was extremely useful to develop production ready implementations in early prototype stages. The community also has great support! https://neo4j.com/startup-program/

4 upvotes·1 comment·4.3K views

ansh lehri

December 9th 2020 at 8:17AM

Thank you Cindee for this information.

View all (4)

Needs advice

and

We have an in-house build experiment management system. We produce samples as input to the next step, which then could produce 1 sample(1-1) and many samples (1 - many). There are many steps like this. So far, we are tracking genealogy (limited tracking) in the MySQL database, which is becoming hard to trace back to the original material or sample(I can give more details if required). So, we are considering a Graph database. I am requesting advice from the experts.

Is a graph database the right choice, or can we manage with RDBMS?
If RDBMS, which RDMS, which feature, or which approach could make this manageable or sustainable
If Graph database(Neo4j, OrientDB, Azure Cosmos DB, Amazon Neptune, ArangoDB), which one is good, and what are the best practices?

I am sorry that this might be a loaded question.

7 upvotes·213.4K views

Replies (1)

ifcologne

Aug 5, 2020

Recommends

ArangoDB

You have not given much detail about the data generated, the depth of such a graph, and the access patterns (queries). However, it is very easy to track all samples and materials if you traverse this graph using a graph database. Here you can use any of the databases mentioned. OrientDB and ArangoDB are also multi-model databases where you can still query the data in a relational way using joins - you retain full flexibility.

In SQL, you can use Common Table Expressions (CTEs) and use them to write a recursive query that reads all parent nodes of a tree.

I would recommend ArangoDB if your samples also have disparate or nested attributes so that the document model (JSON) fits, and you have many complex graph queries that should be performed as efficiently as possible. If not - stay with an RDBMS.

5 upvotes·2 comments·14.5K views

Michael Staub

August 6th 2020 at 4:53PM

Another reason I recommend ArangoDB is the fact that the storage engine does not limit your data model. You cannot create a geo-index on a 'user.location' field in any of the gremlin-compatible stores for example, as the JSON documents can only have one level of properties.

Thiru Medampalli

August 7th 2020 at 9:00PM

Hey @ifcologne,

Thanks for your response, We woud explore the ArangoDB <

Here are some more details if you are wondering

Operation produces many samples(output) from other samples(input). We are traking both Operation and Samples (two graphs i.e one for operation and another for samples), Typical depth is 10 to 20 for both Operation and Samples but some are even deeper(> 20). Operations could be million records(2-3 million) and samples could be (10 to 20 million) records so far over the years. We are using the Closure data model in the dbms to represent the tree/graph data.

Access patern:

API and some power users directly access the data via specific sql(stored procedure and/or special sql sripts). We are open to restrict or enhance the acess pattens further.

We are finding it hard to go upstream/downstream and also merge two tree structures(operations and samples) as depth increaseses

We are finding hard to data mine based on sample or process attributes(some are nesed)

Hard to represent multiple parents to one child.