Need advice about which tool to choose?Ask the StackShare community!
Avro vs JSON: What are the differences?
Avro and JSON are both data serialization formats used for storing and exchanging structured data, but they differ in terms of their schema definition, data size, data typing, and compatibility. Here are the key differences between Avro and JSON:
Schema Definition: Avro requires a schema to be defined before serializing the data. The schema is used to describe the structure of the data, including field names, types, and optional attributes. JSON, on the other hand, does not have a predefined schema. Each JSON document can have a different structure, and the schema is implied based on the data itself.
Data Size: Avro typically produces more compact data compared to JSON. Avro uses a compact binary format and performs schema-based data encoding, which reduces the overall data size. JSON, on the other hand, uses a text-based format that includes field names and values as human-readable strings, resulting in larger data sizes.
Data Typing: Avro supports a rich set of primitive data types, such as integers, floats, strings, booleans, and complex types like arrays and maps. It also allows for defining custom data types through its schema definition. JSON, on the other hand, has a limited set of primitive data types, including strings, numbers, booleans, null, arrays, and objects. JSON does not have built-in support for custom data types.
Compatibility: Avro provides built-in support for schema evolution, which allows for data compatibility across different versions of schemas. It supports forward and backward compatibility, meaning that new or old data can be read using a different version of the schema without loss of information. JSON, however, does not have built-in support for schema evolution. Changes in the structure of JSON data may require manual handling or explicit transformations to ensure compatibility.
Schema Evolution: Avro allows for schema evolution by adding, removing, or modifying fields in a schema without breaking compatibility. It uses a concept called "resolution rules" to handle schema evolution. JSON, on the other hand, does not have a standardized way of handling schema evolution. Changes in the structure of JSON data may require manual adjustments and coordination between producers and consumers of the data.
In summary, Avro and JSON differ in their schema definition, data size, data typing, compatibility, and schema evolution. Avro requires a predefined schema, produces compact binary data, supports a rich set of data types, provides built-in schema evolution capabilities, and allows for forward and backward compatibility. JSON does not have a predefined schema, uses a text-based format, has a limited set of data types, and lacks built-in support for schema evolution.
Hi. Currently, I have a requirement where I have to create a new JSON file based on the input CSV file, validate the generated JSON file, and upload the JSON file into the application (which runs in AWS) using API. Kindly suggest the best language that can meet the above requirement. I feel Python will be better, but I am not sure with the justification of why python. Can you provide your views on this?
Python is very flexible and definitely up the job (although, in reality, any language will be able to cope with this task!). Python has some good libraries built in, and also some third party libraries that will help here. 1. Convert CSV -> JSON 2. Validate against a schema 3. Deploy to AWS
- The builtins include json and csv libraries, and, depending on the complexity of the csv file, it is fairly simple to convert:
import csv
import json
with open("your_input.csv", "r") as f:
csv_as_dict = list(csv.DictReader(f))[0]
with open("your_output.json", "w") as f:
json.dump(csv_as_dict, f)
The validation part is handled nicely by this library: https://pypi.org/project/jsonschema/ It allows you to create a schema and check whether what you have created works for what you want to do. It is based on the json schema standard, allowing annotation and validation of any json
It as an AWS library to automate the upload - or in fact do pretty much anything with AWS - from within your codebase: https://aws.amazon.com/sdk-for-python/ This will handle authentication to AWS and uploading / deploying the file to wherever it needs to go.
A lot depends on the last two pieces, but the converting itself is really pretty neat.
I would use Go. Since CSV files are flat (no hierarchy), you could use the encoding/csv package to read each row, and write out the values as JSON. See https://medium.com/@ankurraina/reading-a-simple-csv-in-go-36d7a269cecd. You just have to figure out in advance what the key is for each row.
This should be pretty doable in any language. Go with whatever you're most familiar with.
That being said, there's a case to be made for using Node.js since it's trivial to convert an object to JSON and vice versa.
Pros of Avro
Pros of JSON
- Simple5
- Widely supported4