Excerpt
Base64 and hex encoding allow binary data like images and files to be transmitted in JSON by converting the binary to text representations.
JSON (JavaScript Object Notation) is a popular data format due to its lightweight text representation. But JSON itself only supports text strings and cannot handle raw binary data like images, audio, pdfs, etc. In this post, we’ll look at techniques for encoding binary data to transmit inside JSON.
What is JSON?
JSON is a text format for representing structured data that is designed to be easy for humans to read and write, and easy for machines to parse. It is composed of:
- Field names and values as string key/value pairs
- Ordered lists of values in arrays
For example:
1{
2 "name": "John",
3 "age": 30,
4 "hobbies": ["coding", "hiking", "photography"]
5}
JSON is commonly used for web APIs, configuration files, and data storage because of its simplicity compared to XML. But JSON only supports text-based data types like strings, numbers, booleans, etc. It has no native way to handle raw binary data.
Methods for Sending Binary Data in JSON
There are a few main approaches for encoding binary data to transmit inside JSON:
- Base64 data encoding
- Hexadecimal encoding
- Custom binary field conventions
We’ll look at the pros and cons of each method.
Base64 Encoding
Base64 is an encoding scheme that converts binary data into a text format using 64 printable ASCII characters. It is commonly used to send binary data over text-based channels.
Base64 is well supported across virtually all languages and platforms. Encoding the binary makes it easy to embed inside JSON strings.
The downside is base64 encoding inflates the data size by about 33% compared to raw binary.
For example, a 2KB image may become 2.7KB when base64 encoded. For small files this overhead is usually acceptable.
Hexadecimal Encoding
Another option is encoding the binary data as a hexadecimal text string. This converts each binary byte into a two character hex representation.
Hex encoding has less overhead than base64, typically only ~10-15% increase compared to raw binary data. However, hex encoding does not have as wide library support as base64.
Custom Binary Fields
Rather than directly encoding the binary data, another approach is to define custom JSON fields that describe the binary data stored externally.
For example:
1{
2 "filename": "image.png",
3 "size": 25536,
4 "url": "https://example.com/images/image.png"
5}
This avoids any encoding overhead, but requires the client to handle external storage and retrieval of the binaries.
When to Encode vs Store Externally
For smaller binary payloads under a few KB, directly encoding as base64 or hex in JSON can make sense. But for larger files like multimedia it is better to store them externally and reference via JSON.
Encoding pros:
- Simple - no external storage needed
- Self-contained payload
External storage pros:
- No encoding bloat
- Can leverage optimized binary storage
Client and Server Handling
On the client, binaries can be encoded to base64 or hex before placing in the JSON.
On the server, the encoded data should be validated and sanitized before decoding to prevent vulnerabilities.
For external custom fields, the server must retrieve the binary file using the metadata. Some security considerations apply when handling remote file references.
Examples
Some examples where binary data can be transmitted in JSON:
- File upload API - Encode image binary in base64
- App configuration - Embed certificates and assets in base64
- Chat app - Send audio clips and images as hex or base64
Summary
While JSON itself only supports text, binary data can be encoded or referenced in JSON in various ways. For small binaries up to a few KB, direct base64 encoding provides a simple solution. For large files or optimized storage, custom JSON fields with external binary references would be better. Like any encoding, there are tradeoffs to balance convenience and efficiency for your particular use case.