Performance
This guide covers performance considerations and optimisation strategies for PDF processing.
Backend Performance
Section titled “Backend Performance”Native vs WASM
Section titled “Native vs WASM”In Node.js, the native backend provides faster performance:
| Operation | WASM | Native | Improvement |
|---|---|---|---|
| Document load | 2.5ms | 1.8ms | ~1.4x |
| Page render (1x) | 15ms | 12ms | ~1.25x |
| Text extraction | 0.8ms | 0.5ms | ~1.6x |
| Character operations | 0.3ms | 0.15ms | ~2x |
// Use native backend for better performanceconst pdfium = await PDFium.init({ useNative: true });Choosing a Backend
Section titled “Choosing a Backend”| Scenario | Recommendation |
|---|---|
| Browser app | WASM only |
| Node.js, high throughput | Native |
| Node.js, need forms/creation | WASM |
| Cross-platform scripts | WASM (portable) |
See Native vs WASM Backends for details.
Rendering Performance
Section titled “Rendering Performance”Scale Factor Impact
Section titled “Scale Factor Impact”Higher scale = more pixels = more time and memory:
| Scale | Pixels (US Letter) | Relative Time |
|---|---|---|
| 0.5 | 306 × 396 | 0.25x |
| 1 | 612 × 792 | 1x |
| 2 | 1224 × 1584 | 4x |
| 3 | 1836 × 2376 | 9x |
// Use appropriate scale for use caseconst thumbnail = page.render({ scale: 0.3 }); // Quick previewconst display = page.render({ scale: 1.5 }); // Screen displayconst print = page.render({ scale: 3 }); // High qualityUse Specific Dimensions
Section titled “Use Specific Dimensions”When you need exact dimensions, use width/height instead of scale:
// Instead of calculating scale...const scale = targetWidth / page.width;const result = page.render({ scale });
// Use direct dimensionsconst result = page.render({ width: 800, height: 1000 });Document Processing
Section titled “Document Processing”Process Pages Sequentially
Section titled “Process Pages Sequentially”Don’t load all pages at once:
// BAD: All pages in memoryconst pages = [];for (let i = 0; i < document.pageCount; i++) { pages.push(document.getPage(i));}
// GOOD: Process one at a timefor (const page of document.pages()) { using p = page; await processPage(p);}Reuse PDFium Instance
Section titled “Reuse PDFium Instance”// BAD: New instance per documentfor (const file of files) { using pdfium = await PDFium.init(); using doc = await pdfium.openDocument(file); // ...}
// GOOD: Reuse instanceusing pdfium = await PDFium.init();for (const file of files) { using doc = await pdfium.openDocument(file); // ...}Batch Processing
Section titled “Batch Processing”Limit Concurrency
Section titled “Limit Concurrency”async function processBatch( pdfium: PDFium, files: Uint8Array[], concurrency = 4) { const results: string[][] = []; const queue = [...files];
async function processNext(): Promise<void> { const data = queue.shift(); if (!data) return;
using document = await pdfium.openDocument(data); const texts: string[] = [];
for (const page of document.pages()) { using p = page; texts.push(p.getText()); }
results.push(texts); await processNext(); }
// Start concurrent workers await Promise.all( Array.from({ length: concurrency }, processNext) );
return results;}Progress Reporting
Section titled “Progress Reporting”interface Progress { current: number; total: number; percentage: number;}
async function processWithProgress( documents: Uint8Array[], onProgress: (progress: Progress) => void) { using pdfium = await PDFium.init(); const total = documents.length;
for (let i = 0; i < total; i++) { using document = await pdfium.openDocument(documents[i]); // Process...
onProgress({ current: i + 1, total, percentage: Math.round(((i + 1) / total) * 100), }); }}Text Extraction
Section titled “Text Extraction”Cache Text Results
Section titled “Cache Text Results”class PDFTextCache { private cache = new Map<string, string>();
getText(page: PDFiumPage, cacheKey: string): string { if (this.cache.has(cacheKey)) { return this.cache.get(cacheKey)!; }
const text = page.getText(); this.cache.set(cacheKey, text); return text; }
clear() { this.cache.clear(); }}Limit Text Extraction
Section titled “Limit Text Extraction”Use maxTextCharCount for very large documents:
const pdfium = await PDFium.init({ limits: { maxTextCharCount: 100_000, // Stop after 100K chars },});Global Limits for Batch Processing
Section titled “Global Limits for Batch Processing”When processing many documents in a pipeline, set limits once with configure() instead of passing them to every PDFium.init() call:
import { configure } from '@scaryterry/pdfium';
configure({ limits: { maxTextCharCount: 100_000, maxDocumentSize: 50 * 1024 * 1024, },});
// All subsequent PDFium.init() calls inherit these limitsusing pdfium = await PDFium.init();See the Security Guide for more details on global configuration.
Search Optimisation
Section titled “Search Optimisation”Early Exit
Section titled “Early Exit”// Find first occurrence onlyfunction findFirst(document: PDFiumDocument, query: string) { for (const page of document.pages()) { using p = page;
for (const result of p.findText(query)) { return { pageIndex: p.index, result }; } } return null;}Limit Results
Section titled “Limit Results”function findLimited( document: PDFiumDocument, query: string, maxResults = 100) { const results: { pageIndex: number; charIndex: number }[] = [];
for (const page of document.pages()) { using p = page;
for (const result of p.findText(query)) { results.push({ pageIndex: p.index, charIndex: result.charIndex, });
if (results.length >= maxResults) { return results; } } }
return results;}Browser Performance
Section titled “Browser Performance”Use Web Workers
Section titled “Use Web Workers”Move PDF processing off the main thread:
await using pdfium = await PDFium.init({ useWorker: true, workerUrl, wasmBinary,});await using document = await pdfium.openDocument(pdfData);const result = await document.renderPage(0, { scale: 2 });Progressive Rendering
Section titled “Progressive Rendering”Render visible pages first:
async function renderVisiblePages( document: PDFiumDocument, visibleIndices: number[], allIndices: number[]) { // Render visible pages first for (const i of visibleIndices) { using page = document.getPage(i); const result = page.render({ scale: 2 }); displayPage(i, result); }
// Then render remaining for (const i of allIndices) { if (!visibleIndices.includes(i)) { using page = document.getPage(i); const result = page.render({ scale: 2 }); cachePage(i, result); } }}Lazy Loading
Section titled “Lazy Loading”class LazyPageRenderer { private rendered = new Map<number, RenderResult>();
constructor( private document: PDFiumDocument, private scale: number ) {}
async getPage(index: number): Promise<RenderResult> { if (this.rendered.has(index)) { return this.rendered.get(index)!; }
using page = this.document.getPage(index); const result = page.render({ scale: this.scale }); this.rendered.set(index, result); return result; }
evict(index: number) { this.rendered.delete(index); }
evictOldest(keepCount: number) { const indices = [...this.rendered.keys()]; while (this.rendered.size > keepCount) { const oldest = indices.shift()!; this.rendered.delete(oldest); } }}Memory vs Speed Trade-offs
Section titled “Memory vs Speed Trade-offs”Low Memory Mode
Section titled “Low Memory Mode”// Process and discard immediatelyasync function lowMemoryProcess(document: PDFiumDocument) { const results: string[] = [];
for (const page of document.pages()) { using p = page; results.push(p.getText()); // Page memory freed after each iteration }
return results;}High Speed Mode (More Memory)
Section titled “High Speed Mode (More Memory)”// Pre-load for faster accessasync function highSpeedProcess(document: PDFiumDocument) { const pages: PDFiumPage[] = [];
// Load all pages first for (let i = 0; i < document.pageCount; i++) { pages.push(document.getPage(i)); }
try { // Fast random access const results = await Promise.all( pages.map(async (page, i) => ({ index: i, text: page.getText(), })) ); return results; } finally { // Cleanup all for (const page of pages) { page.dispose(); } }}Profiling
Section titled “Profiling”Measure Operations
Section titled “Measure Operations”function measureOperation<T>(name: string, fn: () => T): T { const start = performance.now(); const result = fn(); const end = performance.now(); console.log(`${name}: ${(end - start).toFixed(2)}ms`); return result;}
const text = measureOperation('getText', () => page.getText());const result = measureOperation('render', () => page.render({ scale: 2 }));Memory Tracking
Section titled “Memory Tracking”function logMemory(label: string) { if (typeof process !== 'undefined') { const usage = process.memoryUsage(); console.log(`${label}: ${(usage.heapUsed / 1024 / 1024).toFixed(2)}MB`); }}
logMemory('Before load');using document = await pdfium.openDocument(data);logMemory('After load');Summary
Section titled “Summary”| Scenario | Recommendation |
|---|---|
| Thumbnails | Scale 0.3-0.5 |
| Screen display | Scale 1-2 |
| Print quality | Scale 3-4 |
| Many documents | Reuse PDFium instance |
| Large documents | Process pages sequentially |
| Browser UI | Use Web Workers |
| Memory constrained | Lower limits, sequential processing |
See Also
Section titled “See Also”- Native vs WASM Backends — Backend comparison
- Memory Management — Memory details
- Worker Mode — Browser workers
- Architecture — System overview