Need advice about which tool to choose?Ask the StackShare community!

Avro

267
169
+ 1
0
JSON

1.7K
1.5K
+ 1
9
Add tool

Avro vs JSON: What are the differences?

Developers describe Avro as "A data serialization framework *". It is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format. On the other hand, *JSON** is detailed as "A lightweight data-interchange format". JavaScript Object Notation is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language.

Avro can be classified as a tool in the "Serialization Frameworks" category, while JSON is grouped under "Languages".

Redsift, OTTLabs, and Mon Style are some of the popular companies that use JSON, whereas Avro is used by Liferay, LendUp, and BetterCloud. JSON has a broader approval, being mentioned in 32 company stacks & 161 developers stacks; compared to Avro, which is listed in 8 company stacks and 8 developer stacks.

Advice on Avro and JSON
Needs advice
on
JSONJSON
and
PythonPython

Hi. Currently, I have a requirement where I have to create a new JSON file based on the input CSV file, validate the generated JSON file, and upload the JSON file into the application (which runs in AWS) using API. Kindly suggest the best language that can meet the above requirement. I feel Python will be better, but I am not sure with the justification of why python. Can you provide your views on this?

See more
Replies (3)
Recommends
on
PythonPython

Python is very flexible and definitely up the job (although, in reality, any language will be able to cope with this task!). Python has some good libraries built in, and also some third party libraries that will help here. 1. Convert CSV -> JSON 2. Validate against a schema 3. Deploy to AWS

  1. The builtins include json and csv libraries, and, depending on the complexity of the csv file, it is fairly simple to convert:
import csv
import json

with open("your_input.csv", "r") as f:
    csv_as_dict = list(csv.DictReader(f))[0]

with open("your_output.json", "w") as f:
    json.dump(csv_as_dict, f)
  1. The validation part is handled nicely by this library: https://pypi.org/project/jsonschema/ It allows you to create a schema and check whether what you have created works for what you want to do. It is based on the json schema standard, allowing annotation and validation of any json

  2. It as an AWS library to automate the upload - or in fact do pretty much anything with AWS - from within your codebase: https://aws.amazon.com/sdk-for-python/ This will handle authentication to AWS and uploading / deploying the file to wherever it needs to go.

A lot depends on the last two pieces, but the converting itself is really pretty neat.

See more
Max Musing
Founder & CEO at BaseDash · | 1 upvotes · 240.4K views
Recommends
on
Node.jsNode.js
at

This should be pretty doable in any language. Go with whatever you're most familiar with.

That being said, there's a case to be made for using Node.js since it's trivial to convert an object to JSON and vice versa.

See more
Recommends
on
GolangGolang

I would use Go. Since CSV files are flat (no hierarchy), you could use the encoding/csv package to read each row, and write out the values as JSON. See https://medium.com/@ankurraina/reading-a-simple-csv-in-go-36d7a269cecd. You just have to figure out in advance what the key is for each row.

See more
Get Advice from developers at your company using StackShare Enterprise. Sign up for StackShare Enterprise.
Learn More
Pros of Avro
Pros of JSON
    Be the first to leave a pro
    • 5
      Simple
    • 4
      Widely supported

    Sign up to add or upvote prosMake informed product decisions

    What is Avro?

    It is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format.

    What is JSON?

    JavaScript Object Notation is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language.

    Need advice about which tool to choose?Ask the StackShare community!

    What companies use Avro?
    What companies use JSON?
    See which teams inside your own company are using Avro or JSON.
    Sign up for StackShare EnterpriseLearn More

    Sign up to get full access to all the companiesMake informed product decisions

    What tools integrate with Avro?
    What tools integrate with JSON?

    Sign up to get full access to all the tool integrationsMake informed product decisions

    Blog Posts

    Aug 28 2019 at 3:10AM

    Segment

    PythonJavaAmazon S3+16
    7
    2406
    What are some alternatives to Avro and JSON?
    Protobuf
    Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler.
    gRPC
    gRPC is a modern open source high performance RPC framework that can run in any environment. It can efficiently connect services in and across data centers with pluggable support for load balancing, tracing, health checking...
    Apache Thrift
    The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.
    MessagePack
    It is an efficient binary serialization format. It lets you exchange data among multiple languages like JSON. But it's faster and smaller. Small integers are encoded into a single byte, and typical short strings require only one extra byte in addition to the strings themselves.
    Serde
    It is a framework for serializing and deserializing Rust data structures efficiently and generically. The ecosystem consists of data structures that know how to serialize and deserialize themselves along with data formats that know how to serialize and deserialize other things. It provides the layer by which these two groups interact with each other, allowing any supported data structure to be serialized and deserialized using any supported data format.
    See all alternatives