AllJSONTools

Free JSON Developer Tools

How TOON Saves 40% Token Costs for LLM APIs

2026-02-23 · 9 min read · By AllJSONTools

TOON
LLM
Optimization
Having JSON issues?

Paste broken JSON and fix it instantly with AI — plain-English explanations included.

Fix JSON with AI

The Token Cost Problem

Every call to an LLM API — whether it is OpenAI's GPT-4, Anthropic's Claude, Google's Gemini, or any other provider — is billed by the token. Tokens are the fundamental units that language models use to process text: roughly speaking, one token maps to about three-quarters of a word in English. The more tokens you send in a prompt and receive in a completion, the higher your bill.

This pricing model creates a direct financial incentive to minimize the number of tokens in every API call. For a single request, the difference is negligible. But at scale — thousands of requests per hour, each carrying structured data like user profiles, configuration objects, or conversation histories — the overhead of verbose data formats compounds rapidly. A 40% reduction in token count translates directly to a 40% reduction in cost for the structured data portion of your prompts.

JSON is the default format most developers reach for when embedding structured data in LLM prompts. It is universally understood, well-tooled, and unambiguous. But JSON was designed for machine-to-machine communication, not for token efficiency. Every key is wrapped in double quotes. Every string value is wrapped in double quotes. Curly braces, square brackets, colons, and commas all consume tokens. In a typical JSON payload, 30–50% of the tokens are spent on syntax alone rather than on actual data content.

This is the problem that TOON was designed to solve.

What is TOON?

TOON stands for Token-Oriented Object Notation. It is a compact text format that represents the same structured data as JSON but uses significantly fewer tokens when processed by language model tokenizers. TOON achieves this by stripping away the syntactic overhead that JSON requires — quoted keys, curly braces, square brackets, and trailing commas — and replacing them with whitespace-based indentation and minimal punctuation.

The key insight behind TOON is that LLM tokenizers treat whitespace (spaces, newlines, indentation) very efficiently. A two-space indent typically costs one token, while a pair of curly braces with surrounding whitespace can cost two or three tokens. By leaning on indentation to convey structure — similar to how Python uses indentation instead of braces — TOON eliminates a large number of tokens without sacrificing readability.

TOON is not meant to replace JSON in APIs, databases, or file storage. It is purpose-built for a single use case: reducing token consumption when passing structured data to and from LLMs. You convert JSON to TOON before sending it to the model, and convert the model's TOON output back to JSON for your application to consume.

TOON vs JSON: Side-by-Side

The best way to understand TOON's compression is to see the same data in both formats. Below is a user profile object represented first in standard JSON, then in TOON.

JSON Format (~128 tokens)

json
{
  "user": {
    "id": 2847,
    "name": "Alice Chen",
    "email": "alice@example.com",
    "role": "admin",
    "active": true,
    "preferences": {
      "theme": "dark",
      "language": "en",
      "notifications": true
    },
    "tags": ["engineering", "lead", "on-call"]
  }
}

TOON Format (~74 tokens)

yaml
user
  id 2847
  name Alice Chen
  email alice@example.com
  role admin
  active +
  preferences
    theme dark
    language en
    notifications +
  tags [engineering, lead, on-call]

The data is identical. The structure is clear. But the TOON version uses roughly 42% fewer tokens. The savings come from eliminating every pair of double quotes around keys, removing curly braces and colons, using + for true and - for false, and representing simple arrays in a compact inline format.

Here is a more complex example — an API response containing a list of products:

JSON

json
{
  "products": [
    {
      "id": 101,
      "name": "Wireless Mouse",
      "price": 29.99,
      "inStock": true,
      "categories": ["electronics", "accessories"]
    },
    {
      "id": 102,
      "name": "USB-C Hub",
      "price": 49.99,
      "inStock": false,
      "categories": ["electronics", "peripherals"]
    }
  ]
}

TOON

yaml
products
  -
    id 101
    name Wireless Mouse
    price 29.99
    inStock +
    categories [electronics, accessories]
  -
    id 102
    name USB-C Hub
    price 49.99
    inStock -
    categories [electronics, peripherals]

Notice how array items in TOON are separated by a bare - on its own line (when used as an array item delimiter at the start of an object within an array), keeping the structure visually scannable while eliminating the brace-heavy syntax of JSON arrays of objects.

How TOON Works

TOON applies four key compression techniques to reduce token count while preserving the full semantics of JSON data. Each technique targets a specific source of token waste in standard JSON.

1. Remove Quotes from Keys

In JSON, every object key must be wrapped in double quotes: "name": "Alice". In TOON, keys are written bare: name Alice. Since keys are typically alphanumeric identifiers, quotes add no information — they exist only to satisfy JSON's grammar. Removing them saves two tokens per key (the opening and closing quote characters), and for objects with many keys, this alone accounts for a significant portion of the total savings.

2. Use Indentation Instead of Braces

JSON uses { and } to delimit objects and [ and ] to delimit arrays. TOON replaces these with indentation levels. A nested object is simply indented under its parent key. Most LLM tokenizers encode indentation (leading spaces or tabs) very efficiently — often as a single token for an entire indentation level — while braces typically cost one token each. For deeply nested data structures, this substitution yields compounding savings at every level.

3. Array Shorthand

Simple arrays of primitives (strings, numbers) can be written inline in TOON using a compact bracket notation: [a, b, c] instead of JSON's ["a", "b", "c"]. This eliminates the per-element quoting that inflates JSON arrays. For arrays of objects, TOON uses a - delimiter to separate items, avoiding the repeated opening and closing braces that JSON requires for each object in the array.

4. Boolean and Null Shorthand

TOON represents true as +, false as -, and null as ~. The word true is typically one token, but + is also one token — the savings here come from the surrounding context. When a tokenizer processes "active": true, the colon, space, and boolean keyword together may cost three or four tokens. In TOON, active + is typically two tokens total.

Real-World Token Savings

To quantify the impact, we measured token counts for typical payloads using the GPT-4 tokenizer (cl100k_base). The same data was encoded in both JSON (pretty-printed with two-space indentation) and TOON. The results consistently show savings in the 39–43% range.

Payload TypeJSON TokensTOON TokensSavings
User Profile1287442%
API Response (list)31218541%
Config File955839%
Chat History (10 msgs)48627843%
Nested Object (3 levels)21513239%
Array of Records (20 items)89052042%

The savings are remarkably consistent across different types of structured data. Payloads with more nesting and more boolean fields tend toward the higher end of the range, while flat key-value objects with long string values (where the actual content dominates over syntax) tend toward the lower end. On average, you can expect a roughly 40% reduction in tokens for the structured data portion of your prompts.

To put this in dollar terms: if your application sends 100,000 API requests per day, each including a 300-token JSON payload, that is 30 million tokens per day spent on structured data alone. At GPT-4's input pricing, a 40% reduction saves you 12 million tokens per day. Over a month, the cost difference is substantial — and it requires no changes to your business logic, prompts, or model selection.

When to Use TOON

Good For

  • LLM context windows. When you need to fit more data into a model's context window (e.g., 128K tokens for GPT-4 Turbo), compressing structured data with TOON frees up token budget for additional context, instructions, or longer conversations.
  • Prompt engineering. System prompts and few-shot examples that include JSON structures benefit directly from TOON compression. Smaller prompts mean faster inference and lower costs.
  • Chat history compression. In multi-turn conversations, prior messages accumulate rapidly. Converting structured data in chat history from JSON to TOON can significantly extend the effective conversation length before hitting token limits.
  • Batch processing. Applications that make high volumes of LLM API calls with structured data payloads see the largest absolute cost savings from TOON compression.
  • RAG pipelines. Retrieval-Augmented Generation systems that inject retrieved structured data (database records, API responses) into prompts can use TOON to maximize the number of retrieved items that fit in context.

Not Recommended For

  • REST APIs and web services. TOON is not a standardized data interchange format. APIs should continue to use JSON (or Protocol Buffers, MessagePack, etc.) for communication between services.
  • Persistent storage. Databases, file systems, and caches should store data in JSON or other established formats. TOON is a presentation-layer optimization, not a storage format.
  • Direct human editing. While TOON is readable, it lacks the tooling ecosystem (linters, schema validators, IDE extensions) that makes JSON comfortable for direct editing by developers.
  • Contexts where JSON is expected. If the LLM is instructed to return valid JSON (e.g., for function calling or structured output), you should not ask it to return TOON instead. Use TOON for input compression and let the model output in whatever format your application requires.

TOON in Practice

Converting between JSON and TOON is straightforward with the right tools. AllJSONTools provides two free, browser-based converters that handle the transformation instantly:

  • JSON to TOON Converter — Paste your JSON and instantly get the TOON equivalent. Use this to prepare structured data before embedding it in LLM prompts.

  • TOON to JSON Converter — Paste TOON output from an LLM and convert it back to valid JSON for your application to process.

The typical workflow looks like this: your application receives structured data as JSON from a database or API, converts it to TOON before inserting it into the prompt, sends the prompt to the LLM, and then (if the model returns structured data) converts the TOON response back to JSON. Here is an example of what this looks like in code:

typescript
// Step 1: Your original data is JSON
const userData = {
  name: "Alice Chen",
  role: "admin",
  department: "Engineering",
  active: true,
  permissions: ["read", "write", "deploy"]
};

// Step 2: Convert to TOON before sending to LLM
const toonData = jsonToToon(userData);
// Result:
// name Alice Chen
// role admin
// department Engineering
// active +
// permissions [read, write, deploy]

// Step 3: Include in your prompt (40% fewer tokens)
const prompt = `Analyze this user's access level and suggest
permission changes if needed:

${toonData}

Respond with your analysis.`;

// Step 4: Send to LLM API with reduced token count
const response = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [{ role: "user", content: prompt }]
});

For programmatic conversion in Node.js or TypeScript projects, you can implement a simple jsonToToon() utility function or use our online tools for quick one-off conversions. The conversion is deterministic and lossless — you can always reconstruct the original JSON from valid TOON output.

Integration with LLM Workflows

The most impactful place to use TOON is in system prompts and contextual data injections where structured data is included alongside natural language instructions. Here is a concrete example of reducing prompt tokens in a customer support assistant powered by GPT-4 or Claude.

Before: JSON in System Prompt

text
You are a customer support assistant. Here is the customer's account:

{
  "customer": {
    "id": "cust_8a7f3b",
    "name": "Jordan Rivera",
    "plan": "premium",
    "billing_cycle": "annual",
    "active": true,
    "account_created": "2023-06-15",
    "support_tier": "priority",
    "recent_tickets": [
      {
        "id": "tkt_001",
        "subject": "Login issue",
        "status": "resolved",
        "created": "2024-01-10"
      },
      {
        "id": "tkt_002",
        "subject": "Billing question",
        "status": "open",
        "created": "2024-01-18"
      }
    ]
  }
}

Answer the customer's question based on their account data.

After: TOON in System Prompt

text
You are a customer support assistant. Here is the customer's account:

customer
  id cust_8a7f3b
  name Jordan Rivera
  plan premium
  billing_cycle annual
  active +
  account_created 2023-06-15
  support_tier priority
  recent_tickets
    -
      id tkt_001
      subject Login issue
      status resolved
      created 2024-01-10
    -
      id tkt_002
      subject Billing question
      status open
      created 2024-01-18

Answer the customer's question based on their account data.

Both prompts convey exactly the same information to the model. The LLM can read and interpret TOON just as accurately as JSON — language models are trained on a vast corpus that includes many text formats, and indentation-based structures are well within their comprehension capabilities. The difference is purely economic: the TOON version consumes approximately 40% fewer tokens for the structured data block.

For multi-turn chat applications, the savings multiply with every message in the conversation history. If each turn includes structured context (like the customer record above), and you maintain a rolling window of the last 20 messages, converting that structured data from JSON to TOON can save hundreds of tokens per API call. Over thousands of concurrent conversations, this adds up to meaningful cost savings.

Tips for LLM Integration

  • Explain the format once. If using TOON in a system prompt, include a brief note like “The following data is in TOON format (compact structured notation)” so the model has clear context. In practice, most models parse TOON correctly even without this hint, but explicit framing improves reliability.
  • Use TOON for input, JSON for output. Ask the model to respond in standard JSON if your application needs to parse the output programmatically. TOON is an input optimization; output format should match your application's parsing expectations.
  • Benchmark your specific payloads. While 40% is the typical savings, your actual results depend on the shape of your data. Payloads with many short keys and boolean values save more; payloads dominated by long string values save less. Use a tokenizer library like tiktoken to measure your exact savings.
  • Combine with other optimizations. TOON pairs well with other token reduction strategies: summarizing conversation history, using concise system prompts, and pruning irrelevant fields from context data. A layered approach yields the best overall cost efficiency.

TOON is a pragmatic tool for a specific problem: JSON is too verbose for LLM token budgets. By converting your structured data to TOON before sending it to a language model, you reduce costs, fit more context into limited token windows, and maintain full data fidelity. Try converting your own data with our JSON to TOON converter to see the token savings firsthand, or convert TOON back to JSON with our TOON to JSON converter.

Having JSON issues?

Paste broken JSON and fix it instantly with AI — plain-English explanations included.

Fix JSON with AI