How to Send Binary Data in JSON?

Base64 hex encoding JSON transmit binary
On this page

How to Send Binary Data in JSON?

Excerpt

Base64 and hex encoding allow binary data like images and files to be transmitted in JSON by converting the binary to text representations.


JSON (JavaScript Object Notation) is a popular data format due to its lightweight text representation. But JSON itself only supports text strings and cannot handle raw binary data like images, audio, pdfs, etc. In this post, we’ll look at techniques for encoding binary data to transmit inside JSON.

What is JSON?

JSON is a text format for representing structured data that is designed to be easy for humans to read and write, and easy for machines to parse. It is composed of:

  • Field names and values as string key/value pairs
  • Ordered lists of values in arrays

For example:

1{
2  "name": "John",
3  "age": 30,
4  "hobbies": ["coding", "hiking", "photography"]
5}

JSON is commonly used for web APIs, configuration files, and data storage because of its simplicity compared to XML. But JSON only supports text-based data types like strings, numbers, booleans, etc. It has no native way to handle raw binary data.

Methods for Sending Binary Data in JSON

There are a few main approaches for encoding binary data to transmit inside JSON:

  • Base64 data encoding
  • Hexadecimal encoding
  • Custom binary field conventions

We’ll look at the pros and cons of each method.

Base64 Encoding

Base64 is an encoding scheme that converts binary data into a text format using 64 printable ASCII characters. It is commonly used to send binary data over text-based channels.

Base64 is well supported across virtually all languages and platforms. Encoding the binary makes it easy to embed inside JSON strings.

The downside is base64 encoding inflates the data size by about 33% compared to raw binary.

For example, a 2KB image may become 2.7KB when base64 encoded. For small files this overhead is usually acceptable.

Hexadecimal Encoding

Another option is encoding the binary data as a hexadecimal text string. This converts each binary byte into a two character hex representation.

Hex encoding has less overhead than base64, typically only ~10-15% increase compared to raw binary data. However, hex encoding does not have as wide library support as base64.

Custom Binary Fields

Rather than directly encoding the binary data, another approach is to define custom JSON fields that describe the binary data stored externally.

For example:

1{
2  "filename": "image.png",
3  "size": 25536,
4  "url": "https://example.com/images/image.png"
5}

This avoids any encoding overhead, but requires the client to handle external storage and retrieval of the binaries.

When to Encode vs Store Externally

For smaller binary payloads under a few KB, directly encoding as base64 or hex in JSON can make sense. But for larger files like multimedia it is better to store them externally and reference via JSON.

Encoding pros:

  • Simple - no external storage needed
  • Self-contained payload

External storage pros:

  • No encoding bloat
  • Can leverage optimized binary storage

Client and Server Handling

On the client, binaries can be encoded to base64 or hex before placing in the JSON.

On the server, the encoded data should be validated and sanitized before decoding to prevent vulnerabilities.

For external custom fields, the server must retrieve the binary file using the metadata. Some security considerations apply when handling remote file references.

Examples

Some examples where binary data can be transmitted in JSON:

  • File upload API - Encode image binary in base64
  • App configuration - Embed certificates and assets in base64
  • Chat app - Send audio clips and images as hex or base64

Summary

While JSON itself only supports text, binary data can be encoded or referenced in JSON in various ways. For small binaries up to a few KB, direct base64 encoding provides a simple solution. For large files or optimized storage, custom JSON fields with external binary references would be better. Like any encoding, there are tradeoffs to balance convenience and efficiency for your particular use case.