AllJSONTools

Free JSON Developer Tools

How to Handle Large JSON Files Without Crashing

2026-02-26 · 15 min read · By AllJSONTools

Performance
Streaming
Node.js
Python
Having JSON issues?

Paste broken JSON and fix it instantly with AI — plain-English explanations included.

Fix JSON with AI

The Problem: JSON.parse() Was Not Built for Big Data

Most developers first encounter the problem the same way: a script that worked fine in development suddenly crashes in production with FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory. The culprit is almost always a single line of code: JSON.parse(fs.readFileSync('data.json', 'utf-8')).

When you call JSON.parse(), the entire JSON string must be loaded into memory at once. The parser then builds a complete JavaScript object tree — which can consume two to five times the size of the raw string. A 500MB JSON file can easily require 1.5–2.5GB of RAM just to parse. On a machine with 4GB of available heap, that is a guaranteed crash.

This guide covers every practical strategy for working with large JSON files — from streaming parsers and format conversion to command-line tools and browser-based approaches. Whether you are dealing with a 50MB API export or a 10GB log dump, there is a technique here that will keep your process alive and your memory usage under control.

Why JSON Files Get Large

JSON was designed for lightweight data interchange between a browser and a server. It was never intended to serve as a bulk data format. Yet developers routinely generate massive JSON files in the following scenarios:

  • Full API exports — Services like Stripe, Shopify, and Firebase let you export your entire dataset as a single JSON file. A store with 100,000 orders can produce a file hundreds of megabytes in size.

  • Application logging — Structured logging frameworks (Winston, Bunyan, Pino) write one JSON object per log line. A busy service can generate gigabytes of JSON logs per day.

  • Data pipelines and ETL — Intermediate stages in data pipelines often serialize results as JSON. When a pipeline processes millions of records, the intermediate files balloon quickly.

  • Database dumps — Tools like mongoexport and pg_dump --format=json produce JSON files proportional to the database size. A moderate-sized MongoDB collection can dump to a multi-gigabyte file.

  • GeoJSON datasets — Geographic data files with detailed polygon coordinates routinely exceed 1GB.

Know Your Memory Limits

Before choosing a strategy, understand the constraints of your runtime environment.

EnvironmentDefault Heap LimitMax Practical JSON Size
Node.js (64-bit)~1.7 GB (V8 default)~300–500 MB
Node.js with --max-old-space-size=81928 GB~1.5–2.5 GB
Chrome / Edge~4 GB (varies by OS)~500 MB–1 GB
Firefox~4 GB~500 MB–1 GB
V8 string size limit~512 MB (strings cannot exceed ~536 million characters)

The V8 string size limit is particularly important. Even if you increase the heap with --max-old-space-size, you cannot fs.readFileSync() a file larger than about 512MB into a single string. This is a hard limit baked into the V8 engine. For files above this threshold, streaming is not optional — it is the only option.

Streaming JSON in Node.js

Streaming parsers read a JSON file chunk by chunk and emit events or objects as they are discovered. The entire file never needs to reside in memory at once. This is the most important technique for handling large JSON in Node.js.

Using stream-json

The stream-json library is the most mature streaming JSON parser for Node.js. It provides composable stream components that you pipe together to extract exactly the data you need.

javascript
const { parser } = require("stream-json");
const { streamArray } = require("stream-json/streamers/StreamArray");
const fs = require("fs");

// Process a large JSON array without loading it all into memory
// File contains: [{"id": 1, ...}, {"id": 2, ...}, ...]
const pipeline = fs.createReadStream("huge-dataset.json")
  .pipe(parser())
  .pipe(streamArray());

let count = 0;
let totalRevenue = 0;

pipeline.on("data", ({ key, value }) => {
  // 'value' is one fully-parsed object from the array
  count++;
  totalRevenue += value.revenue || 0;

  // You can filter, transform, or write to another stream
  if (value.status === "active") {
    // process active records...
  }
});

pipeline.on("end", () => {
  console.log(`Processed ${count} records`);
  console.log(`Total revenue: $${totalRevenue.toFixed(2)}`);
});

pipeline.on("error", (err) => {
  console.error("Stream error:", err.message);
});

Filtering with stream-json Picks

When you only need a subset of the data, use pick to extract specific paths without parsing the rest. This is dramatically faster for targeted extraction from large files.

javascript
const { parser } = require("stream-json");
const { pick } = require("stream-json/filters/Pick");
const { streamValues } = require("stream-json/streamers/StreamValues");
const fs = require("fs");

// Extract only the "users" array from a large nested JSON file
// File: { "metadata": {...}, "users": [...], "logs": [...] }
fs.createReadStream("large-export.json")
  .pipe(parser())
  .pipe(pick({ filter: "users" }))
  .pipe(streamValues())
  .on("data", ({ value }) => {
    console.log("User:", value.name, value.email);
  })
  .on("end", () => {
    console.log("Done extracting users");
  });

Using readline for NDJSON

If your file is in newline-delimited JSON (NDJSON) format — one JSON object per line — you can use Node’s built-in readline module without any third-party dependencies.

javascript
const fs = require("fs");
const readline = require("readline");

async function processNDJSON(filePath) {
  const fileStream = fs.createReadStream(filePath);
  const rl = readline.createInterface({
    input: fileStream,
    crlfDelay: Infinity,
  });

  let lineNumber = 0;
  const errors = [];

  for await (const line of rl) {
    lineNumber++;
    if (!line.trim()) continue; // skip empty lines

    try {
      const record = JSON.parse(line);
      // Process each record individually
      await processRecord(record);
    } catch (err) {
      errors.push({ line: lineNumber, error: err.message });
    }
  }

  console.log(`Processed ${lineNumber} lines, ${errors.length} errors`);
  return errors;
}

processNDJSON("server-logs.ndjson");

This approach uses almost no memory regardless of file size, because only one line is in memory at any time. It is ideal for log files, event streams, and any dataset where each line is an independent record.

JSON Lines (NDJSON): A Better Format for Large Datasets

Standard JSON wraps an array of objects in square brackets with commas between items: [obj1, obj2, obj3, ...]. This means a parser must track nested brackets, handle commas, and cannot start processing until it has enough context to identify complete objects. For multi-gigabyte files, this is a fundamental design problem.

JSON Lines (also called Newline-Delimited JSON, or NDJSON) solves this by placing one complete JSON object per line, with no wrapping array and no commas between records. Each line is a valid, self-contained JSON document.

json
{"id": 1, "name": "Alice", "email": "alice@example.com", "status": "active"}
{"id": 2, "name": "Bob", "email": "bob@example.com", "status": "inactive"}
{"id": 3, "name": "Carol", "email": "carol@example.com", "status": "active"}
{"id": 4, "name": "Dave", "email": "dave@example.com", "status": "active"}

This format has several advantages for large datasets:

  • Trivial streaming — Read one line, parse it, process it, discard it. No need for a streaming parser library.

  • Append-friendly — Adding new records means appending lines. With standard JSON arrays, you would need to remove the closing bracket, add a comma, insert the new object, and re-add the bracket.

  • Unix tool compatible — You can use head, tail, wc -l, grep, and split directly on NDJSON files.

  • Fault-tolerant — A corrupt line only affects that one record. With a standard JSON array, a single syntax error makes the entire file unparseable.

Here is a Node.js script that converts a standard JSON array file to NDJSON format, streaming both reads and writes to handle files of any size:

javascript
const { parser } = require("stream-json");
const { streamArray } = require("stream-json/streamers/StreamArray");
const fs = require("fs");

const input = fs.createReadStream("large-array.json");
const output = fs.createWriteStream("large-array.ndjson");

input
  .pipe(parser())
  .pipe(streamArray())
  .on("data", ({ value }) => {
    output.write(JSON.stringify(value) + "\n");
  })
  .on("end", () => {
    output.end();
    console.log("Conversion complete");
  });

Handling Large JSON in Python

Python’s built-in json.load() has the same problem as JavaScript’s JSON.parse(): it reads the entire file into memory and builds a complete Python dictionary. For large files, the ijson library provides SAX-style streaming that keeps memory usage constant.

Streaming with ijson

python
import ijson

# Stream through a large JSON array without loading it all
# File contains: [{"id": 1, ...}, {"id": 2, ...}, ...]
def process_large_json(filepath):
    count = 0
    active_users = []

    with open(filepath, "rb") as f:
        # "item" means each element of the top-level array
        for record in ijson.items(f, "item"):
            count += 1
            if record.get("status") == "active":
                active_users.append(record["id"])

            # Print progress every 100,000 records
            if count % 100_000 == 0:
                print(f"Processed {count:,} records...")

    print(f"Total: {count:,} records, {len(active_users):,} active")
    return active_users

# For nested paths, use dot notation:
# ijson.items(f, "data.users.item") -> streams each user object

Line-by-Line NDJSON in Python

For NDJSON files, Python’s built-in file iteration is already a streaming parser. No third-party library is needed.

python
import json

def process_ndjson(filepath):
    results = []

    with open(filepath, "r") as f:
        for line_num, line in enumerate(f, 1):
            line = line.strip()
            if not line:
                continue
            try:
                record = json.loads(line)
                # Process each record
                if record.get("level") == "error":
                    results.append({
                        "timestamp": record["timestamp"],
                        "message": record["message"],
                    })
            except json.JSONDecodeError as e:
                print(f"Line {line_num}: {e}")

    return results

# For very large output, write results as you go:
def transform_ndjson(input_path, output_path):
    with open(input_path, "r") as fin, open(output_path, "w") as fout:
        for line in fin:
            record = json.loads(line.strip())
            # Transform and write immediately
            transformed = {
                "id": record["id"],
                "summary": record["name"][:50],
            }
            fout.write(json.dumps(transformed) + "\n")

Using pandas for Analysis

If your goal is data analysis rather than transformation, pandas can read NDJSON files in chunks, keeping memory usage manageable even for very large datasets.

python
import pandas as pd

# Read NDJSON in chunks of 10,000 records at a time
chunks = pd.read_json("large-data.ndjson", lines=True, chunksize=10_000)

total_revenue = 0
for chunk in chunks:
    # Each chunk is a DataFrame with up to 10,000 rows
    total_revenue += chunk["revenue"].sum()

print(f"Total revenue: ${total_revenue:,.2f}")

Browser Strategies for Large JSON

Handling large JSON in the browser is trickier than in Node.js because you have less memory, no filesystem streams, and blocking the main thread freezes the entire UI. Here are three strategies that work.

Web Workers for Off-Thread Parsing

Move the heavy JSON.parse() call to a Web Worker so it runs on a separate thread. The UI stays responsive while the worker parses the data, then posts the result back.

javascript
// json-worker.js — runs in a separate thread
self.onmessage = function (event) {
  try {
    const data = JSON.parse(event.data);
    // Optionally reduce the data before sending back
    const summary = {
      totalRecords: data.length,
      firstRecord: data[0],
      lastRecord: data[data.length - 1],
    };
    self.postMessage({ success: true, data: summary });
  } catch (err) {
    self.postMessage({ success: false, error: err.message });
  }
};

// main.js — UI thread stays responsive
function parseWithWorker(jsonString) {
  return new Promise((resolve, reject) => {
    const worker = new Worker("json-worker.js");

    worker.onmessage = (event) => {
      worker.terminate();
      if (event.data.success) {
        resolve(event.data.data);
      } else {
        reject(new Error(event.data.error));
      }
    };

    worker.postMessage(jsonString);
  });
}

Chunked Reading with FileReader and Blob.slice()

When a user uploads a large JSON file through a file input, you do not have to read it all at once. Use Blob.slice() to read it in chunks. This works especially well for NDJSON files where each line is an independent record.

javascript
async function readNDJSONChunked(file, onRecord) {
  const CHUNK_SIZE = 1024 * 1024; // 1 MB chunks
  let offset = 0;
  let leftover = "";

  while (offset < file.size) {
    const chunk = file.slice(offset, offset + CHUNK_SIZE);
    const text = await chunk.text();

    const combined = leftover + text;
    const lines = combined.split("\n");

    // The last element might be incomplete — save it for next chunk
    leftover = lines.pop() || "";

    for (const line of lines) {
      if (line.trim()) {
        try {
          const record = JSON.parse(line);
          onRecord(record);
        } catch (e) {
          console.warn("Skipping invalid line:", e.message);
        }
      }
    }

    offset += CHUNK_SIZE;
  }

  // Process any remaining data
  if (leftover.trim()) {
    try {
      onRecord(JSON.parse(leftover));
    } catch (e) {
      console.warn("Skipping final incomplete line");
    }
  }
}

// Usage with a file input
document.getElementById("fileInput").addEventListener("change", (e) => {
  const file = e.target.files[0];
  let count = 0;

  readNDJSONChunked(file, (record) => {
    count++;
    // Process each record
  }).then(() => {
    console.log(`Processed ${count} records`);
  });
});

For standard JSON arrays (not NDJSON), the browser approach is more limited. Your best options are to either convert the file to NDJSON first, use a Web Worker for the full parse, or use the JSON to CSV converter to transform it into a more manageable format for analysis.

jq: The Swiss Army Knife for Large JSON

jq is a command-line JSON processor that can filter, transform, and extract data from JSON files. It uses a streaming approach internally, making it capable of handling files much larger than available RAM for many operations.

bash
# Count records in a large JSON array
jq 'length' huge-dataset.json

# Extract just the email field from each record
jq '.[].email' huge-dataset.json > emails.txt

# Filter records by condition
jq '[.[] | select(.status == "active")]' huge-dataset.json > active.json

# Get the first 100 records (for sampling)
jq '.[:100]' huge-dataset.json > sample.json

# Extract nested data
jq '.[] | {id: .id, city: .address.city}' huge-dataset.json

# Stream mode for truly massive files (processes one value at a time)
jq --stream 'select(.[0][-1] == "email") | .[1]' huge-dataset.json

# Convert a JSON array to NDJSON (one object per line)
jq -c '.[]' huge-array.json > output.ndjson

# Process NDJSON input (jq handles it natively with -c)
jq -c 'select(.level == "error")' logs.ndjson > errors.ndjson

# Aggregate values across records
jq '[.[].revenue] | add' sales-data.json

The --stream flag is particularly powerful for huge files. Instead of loading the entire document, jq emits path-value pairs as it encounters them. This means you can extract specific fields from a 10GB file using only a few megabytes of RAM.

Before running jq queries on a full dataset, test your filter on a small sample. You can use the JSON Tree Viewer to understand the structure of a sample record, or the JSONPath Query tool to prototype your data extraction logic interactively. For a comprehensive reference on query syntax, see the JSONPath Cheatsheet.

Pagination: Avoid the Problem Entirely

The best strategy for large JSON is to never create it in the first place. If you are fetching data from an API, use pagination to retrieve manageable chunks instead of downloading everything in a single request.

javascript
// Paginated API fetching with cursor-based pagination
async function fetchAllRecords(baseUrl) {
  const allRecords = [];
  let cursor = null;
  let page = 0;

  do {
    page++;
    const url = cursor
      ? `${baseUrl}?cursor=${cursor}&limit=100`
      : `${baseUrl}?limit=100`;

    const response = await fetch(url);
    const data = await response.json();

    allRecords.push(...data.results);
    cursor = data.nextCursor;

    console.log(`Page ${page}: fetched ${data.results.length} records`);
  } while (cursor);

  return allRecords;
}

// Even better: process each page immediately instead of accumulating
async function streamAllRecords(baseUrl, processPage) {
  let cursor = null;

  do {
    const url = cursor
      ? `${baseUrl}?cursor=${cursor}&limit=100`
      : `${baseUrl}?limit=100`;

    const response = await fetch(url);
    const data = await response.json();

    // Process this page immediately, then discard it
    await processPage(data.results);
    cursor = data.nextCursor;
  } while (cursor);
}

// Usage: write each page to an NDJSON file as it arrives
const fs = require("fs");
const output = fs.createWriteStream("all-records.ndjson");

await streamAllRecords("https://api.example.com/records", (records) => {
  for (const record of records) {
    output.write(JSON.stringify(record) + "\n");
  }
});

output.end();

Notice how the second approach writes each page to an NDJSON file immediately. This means memory usage stays constant regardless of how many total records exist. You can then process the resulting NDJSON file with any of the streaming techniques described earlier. For more strategies on working with APIs effectively, see our guide on parsing JSON in JavaScript.

Converting Large JSON to Other Formats

Sometimes the right answer is not to process the JSON as JSON. Converting to a different format can make the data dramatically easier to work with, especially for analysis tasks.

JSON to CSV

CSV is ideal when your JSON contains a flat array of objects with consistent keys. The result can be opened in Excel, Google Sheets, or loaded into a database. Use the JSON to CSV converter for smaller files, or stream the conversion for large ones:

javascript
const { parser } = require("stream-json");
const { streamArray } = require("stream-json/streamers/StreamArray");
const fs = require("fs");

const input = fs.createReadStream("large-dataset.json");
const output = fs.createWriteStream("large-dataset.csv");

let headerWritten = false;

input
  .pipe(parser())
  .pipe(streamArray())
  .on("data", ({ value }) => {
    if (!headerWritten) {
      // Write CSV header from the first object's keys
      output.write(Object.keys(value).join(",") + "\n");
      headerWritten = true;
    }

    // Write each row, escaping commas and quotes
    const row = Object.values(value).map((v) => {
      const str = String(v ?? "");
      return str.includes(",") || str.includes('"')
        ? `"${str.replace(/"/g, '""')}"`
        : str;
    });
    output.write(row.join(",") + "\n");
  })
  .on("end", () => {
    output.end();
    console.log("CSV conversion complete");
  });

JSON Array to NDJSON

Converting from a standard JSON array to NDJSON is one of the most valuable transformations for large files. Once in NDJSON format, every subsequent processing step becomes simpler and more memory-efficient.

bash
# Using jq (simplest approach)
jq -c '.[]' large-array.json > output.ndjson

# Using Python for files too large for jq
python3 -c "
import ijson, json, sys
with open('large-array.json', 'rb') as f:
    for record in ijson.items(f, 'item'):
        print(json.dumps(record))
" > output.ndjson

Once you have your data in a manageable format, use the JSON Formatter to pretty-print and inspect individual records, or the JSON Diff tool to compare sections of the transformed output against the original.

Performance Benchmarks

To illustrate the practical differences between approaches, here are approximate benchmarks for processing a 1GB JSON file containing an array of 2 million records on a machine with 8GB RAM.

ApproachPeak MemoryTimeResult
JSON.parse(fs.readFileSync())~3.5 GBN/ACRASH (heap out of memory)
JSON.parse() with --max-old-space-size=8192~3.5 GB~45sWorks but risky
stream-json (Node.js)~50 MB~90sStable, constant memory
ijson (Python)~30 MB~120sStable, constant memory
NDJSON + readline~15 MB~60sFastest streaming option
jq (CLI)~80 MB~70sNo code required
jq --stream (CLI)~10 MB~150sLowest memory, slower

The key takeaway: streaming approaches use 50–200x less memory than loading the entire file. The trade-off is typically a 1.5–3x increase in processing time. For most use cases, this is an excellent trade: using 50MB instead of 3.5GB of RAM is far more valuable than saving 30 seconds.

NDJSON with readline is the fastest streaming approach because it avoids the overhead of a streaming JSON tokenizer. Each line is a complete JSON document that can be parsed independently with the native JSON.parse(), which is highly optimized in V8.

Decision Guide: Which Approach Should You Use?

Choosing the right strategy depends on your file size, format, runtime, and what you need to do with the data. Use this decision guide to pick the right approach.

By File Size

  • Under 10 MB JSON.parse() is fine. Format and inspect with the JSON Formatter.

  • 10–100 MB — Consider streaming if memory is tight. On a modern machine with 16GB RAM, a direct parse still works but will block for several seconds.

  • 100–500 MB — Streaming is strongly recommended. Use stream-json or ijson.

  • Over 500 MB — Streaming is mandatory. Convert to NDJSON first if possible. Use jq --stream for one-off extraction.

By Task

  • Extract a few fields — Use jq or the JSONPath Query tool on a sample, then script the extraction with streaming.

  • Aggregate or analyze — Convert to CSV with the JSON to CSV converter, or use pandas with NDJSON chunks.

  • Transform and re-export — Stream through with stream-json or ijson, writing transformed records to a new file.

  • Understand the structure — Extract a sample with jq '.[:5]' and explore it in the JSON Tree Viewer.

  • Compare sections — Extract slices with jq and use the JSON Diff tool to compare them side by side.

By Runtime Environment

EnvironmentBest ApproachLibrary / Tool
Node.jsStreaming with pipe chainsstream-json, readline
PythonIterative streamingijson, pandas (chunked)
BrowserWeb Worker + chunked readingFileReader, Blob.slice()
Command linejq with stream flagjq, grep, awk
Any (format conversion)Convert to NDJSON or CSV firstjq -c '.[]', stream-based scripts

Summary

Large JSON files are an increasingly common reality in modern development. The key insight is that JSON.parse() and json.load() were designed for small, self-contained payloads — not for bulk data processing. Once a file exceeds roughly 100MB, you need a different approach.

The three most impactful strategies are: (1) stream the file instead of loading it all at once, using stream-json in Node.js or ijson in Python; (2) convert the data to NDJSON format so every subsequent processing step becomes trivial; and (3) use jq on the command line for quick extraction and transformation without writing any code.

For working with smaller chunks extracted from large files, AllJSONTools provides a complete set of free tools: format and validate individual records, visualize their structure, convert to CSV for spreadsheet analysis, compare sections side by side, and query specific data with JSONPath expressions. All tools run in your browser with no data sent to any server.

Having JSON issues?

Paste broken JSON and fix it instantly with AI — plain-English explanations included.

Fix JSON with AI