Alternatives to Akutan logo

Alternatives to Akutan

Apache Beam, Apache Spark, Apache Flink, Arc, and Neo4j are the most popular alternatives and competitors to Akutan.
5
30
+ 1
0

What is Akutan and what are its top alternatives?

A distributed knowledge graph store. Knowledge graphs are suitable for modeling data that is highly interconnected by many types of relationships, like encyclopedic information about the world.
Akutan is a tool in the Graph Databases category of a tech stack.
Akutan is an open source tool with 1.6K GitHub stars and 90 GitHub forks. Here’s a link to Akutan's open source repository on GitHub

Top Alternatives to Akutan

  • Apache Beam

    Apache Beam

    It implements batch and streaming data processing jobs that run on any execution engine. It executes pipelines on multiple execution environments. ...

  • Apache Spark

    Apache Spark

    Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning. ...

  • Apache Flink

    Apache Flink

    Apache Flink is an open source system for fast and versatile data analytics in clusters. Flink supports batch and streaming analytics, in one system. Analytical programs can be written in concise and elegant APIs in Java and Scala. ...

  • Arc

    Arc

    Arc is designed for exploratory programming: the kind where you decide what to write by writing it. A good medium for exploratory programming is one that makes programs brief and malleable, so that's what we've aimed for. This is a medium for sketching software. ...

  • Neo4j

    Neo4j

    Neo4j stores data in nodes connected by directed, typed relationships with properties on both, also known as a Property Graph. It is a high performance graph store with all the features expected of a mature and robust database, like a friendly query language and ACID transactions. ...

  • Dgraph

    Dgraph

    Dgraph's goal is to provide Google production level scale and throughput, with low enough latency to be serving real time user queries, over terabytes of structured data. Dgraph supports GraphQL-like query syntax, and responds in JSON and Protocol Buffers over GRPC and HTTP. ...

  • Titan

    Titan

    Titan is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster. Titan is a transactional database that can support thousands of concurrent users executing complex graph traversals in real time. ...

  • JanusGraph

    JanusGraph

    It is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster. It is a transactional database that can support thousands of concurrent users executing complex graph traversals in real time. ...

Akutan alternatives & related posts

Apache Beam logo

Apache Beam

138
262
14
A unified programming model
138
262
+ 1
14
PROS OF APACHE BEAM
  • 5
    Open-source
  • 5
    Cross-platform
  • 2
    Portable
  • 2
    Unified batch and stream processing
CONS OF APACHE BEAM
    Be the first to leave a con

    related Apache Beam posts

    I have to build a data processing application with an Apache Beam stack and Apache Flink runner on an Amazon EMR cluster. I saw some instability with the process and EMR clusters that keep going down. Here, the Apache Beam application gets inputs from Kafka and sends the accumulative data streams to another Kafka topic. Any advice on how to make the process more stable?

    See more
    Apache Spark logo

    Apache Spark

    2.3K
    2.6K
    132
    Fast and general engine for large-scale data processing
    2.3K
    2.6K
    + 1
    132
    PROS OF APACHE SPARK
    • 58
      Open-source
    • 48
      Fast and Flexible
    • 7
      One platform for every big data problem
    • 6
      Easy to install and to use
    • 6
      Great for distributed SQL like applications
    • 3
      Works well for most Datascience usecases
    • 2
      Machine learning libratimery, Streaming in real
    • 2
      In memory Computation
    • 0
      Interactive Query
    CONS OF APACHE SPARK
    • 3
      Speed

    related Apache Spark posts

    Eric Colson
    Chief Algorithms Officer at Stitch Fix · | 21 upvotes · 1.9M views

    The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. We store data in an Amazon S3 based data warehouse. Apache Spark on Yarn is our tool of choice for data movement and #ETL. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.

    Beyond data movement and ETL, most #ML centric jobs (e.g. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see https://github.com/stitchfix/flotilla-os).

    At Stitch Fix, algorithmic integrations are pervasive across the business. We have dozens of data products actively integrated systems. That requires serving layer that is robust, agile, flexible, and allows for self-service. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. This provides our data scientist a one-click method of getting from their algorithms to production. We then integrate those deployments into a service mesh, which allows us to A/B test various implementations in our product.

    For more info:

    #DataScience #DataStack #Data

    See more
    Conor Myhrvold
    Tech Brand Mgr, Office of CTO at Uber · | 7 upvotes · 943.3K views

    Why we built Marmaray, an open source generic data ingestion and dispersal framework and library for Apache Hadoop :

    Built and designed by our Hadoop Platform team, Marmaray is a plug-in-based framework built on top of the Hadoop ecosystem. Users can add support to ingest data from any source and disperse to any sink leveraging the use of Apache Spark . The name, Marmaray, comes from a tunnel in Turkey connecting Europe and Asia. Similarly, we envisioned Marmaray within Uber as a pipeline connecting data from any source to any sink depending on customer preference:

    https://eng.uber.com/marmaray-hadoop-ingestion-open-source/

    (Direct GitHub repo: https://github.com/uber/marmaray Kafka Kafka Manager )

    See more
    Apache Flink logo

    Apache Flink

    380
    575
    35
    Fast and reliable large-scale data processing engine
    380
    575
    + 1
    35
    PROS OF APACHE FLINK
    • 15
      Unified batch and stream processing
    • 8
      Out-of-the box connector to kinesis,s3,hdfs
    • 8
      Easy to use streaming apis
    • 3
      Open Source
    • 1
      Low latency
    CONS OF APACHE FLINK
      Be the first to leave a con

      related Apache Flink posts

      Surabhi Bhawsar
      Technical Architect at Pepcus · | 7 upvotes · 511.8K views
      Shared insights
      on
      Kafka
      Apache Flink

      I need to build the Alert & Notification framework with the use of a scheduled program. We will analyze the events from the database table and filter events that are falling under a day timespan and send these event messages over email. Currently, we are using Kafka Pub/Sub for messaging. The customer wants us to move on Apache Flink, I am trying to understand how Apache Flink could be fit better for us.

      See more

      I have to build a data processing application with an Apache Beam stack and Apache Flink runner on an Amazon EMR cluster. I saw some instability with the process and EMR clusters that keep going down. Here, the Apache Beam application gets inputs from Kafka and sends the accumulative data streams to another Kafka topic. Any advice on how to make the process more stable?

      See more
      Arc logo

      Arc

      38
      34
      0
      A dialect of the Lisp programming language developed by Paul Graham and Robert Morris
      38
      34
      + 1
      0
      PROS OF ARC
        Be the first to leave a pro
        CONS OF ARC
          Be the first to leave a con

          related Arc posts

          Neo4j logo

          Neo4j

          904
          1K
          329
          The world’s leading Graph Database
          904
          1K
          + 1
          329
          PROS OF NEO4J
          • 66
            Cypher – graph query language
          • 57
            Great graphdb
          • 31
            Open source
          • 29
            Rest api
          • 27
            High-Performance Native API
          • 23
            ACID
          • 20
            Easy setup
          • 14
            Great support
          • 10
            Clustering
          • 8
            Great Web Admin UI
          • 8
            Hot Backups
          • 7
            Powerful, flexible data model
          • 6
            Mature
          • 5
            Embeddable
          • 4
            Easy to Use and Model
          • 3
            Best Graphdb
          • 3
            Highly-available
          • 2
            Used by Crunchbase
          • 2
            It's awesome, I wanted to try it
          • 2
            Great onboarding process
          • 2
            Great query language and built in data browser
          CONS OF NEO4J
          • 4
            Can't store a vertex as JSON
          • 3
            Comparably slow

          related Neo4j posts

          We have an in-house build experiment management system. We produce samples as input to the next step, which then could produce 1 sample(1-1) and many samples (1 - many). There are many steps like this. So far, we are tracking genealogy (limited tracking) in the MySQL database, which is becoming hard to trace back to the original material or sample(I can give more details if required). So, we are considering a Graph database. I am requesting advice from the experts.

          1. Is a graph database the right choice, or can we manage with RDBMS?
          2. If RDBMS, which RDMS, which feature, or which approach could make this manageable or sustainable
          3. If Graph database(Neo4j, OrientDB, Azure Cosmos DB, Amazon Neptune, ArangoDB), which one is good, and what are the best practices?

          I am sorry that this might be a loaded question.

          See more

          I'm evaluating the use of RedisGraph vs Microsoft SQL Server 2019 graph features to build a social graph. One of the key criteria is high availability and cross data center replication of data. While Neo4j is a much-matured solution in general, I'm not accounting for it due to the cost & introduction of a new stack in the ecosystem. Also, due to the nature of data & org policies, using a cloud-based solution won't be a viable choice.

          We currently use Redis as a cache & SQL server 2019 as RDBMS.

          I'm inclining towards SQL server 2019 graph as we already use SQL server extensively as relational database & have all the HA and cross data center replication setup readily available. I still need to evaluate if it fulfills our need as a graph DB though, I also learned that SQL server 2019 is still a new player in the market and attempts to fit a graph-like query on top of a relational model (with node and edge tables). RedisGraph seems very promising. However, I'm not totally sure about HA, Graph data backup, cross-data center support.

          See more
          Dgraph logo

          Dgraph

          90
          151
          7
          Fast, Distributed Graph DB
          90
          151
          + 1
          7
          PROS OF DGRAPH
          • 3
            Graphql as a query language is nice if you like apollo
          • 1
            High Performance
          • 1
            Open Source
          • 1
            Low learning curve
          • 1
            Easy set up
          CONS OF DGRAPH
            Be the first to leave a con

            related Dgraph posts

            Titan logo

            Titan

            33
            47
            0
            Distributed Graph Database
            33
            47
            + 1
            0
            PROS OF TITAN
              Be the first to leave a pro
              CONS OF TITAN
                Be the first to leave a con

                related Titan posts

                JanusGraph logo

                JanusGraph

                32
                68
                0
                Open-source, distributed graph database
                32
                68
                + 1
                0
                PROS OF JANUSGRAPH
                  Be the first to leave a pro
                  CONS OF JANUSGRAPH
                    Be the first to leave a con

                    related JanusGraph posts