Suparse

The Ultimate Guide to Suparse's Document Extraction API

Profile picture of Michal Raczy
Michal Raczy
September 11, 20254 min read
api
developers
data extraction
automation
The Ultimate Guide to Suparse's Document Extraction API

In this guide, we'll show you what the Suparse document extraction API gives your product team: a reliable way to turn PDFs and document images into structured data without building your own OCR, parsing, validation, and export pipeline. For implementation details, SDK usage, and endpoint reference, go to Suparse Docs.

Understanding the Suparse REST API for Data Extraction

Suparse processing is asynchronous because financial documents can take time to classify, split, extract, validate, and export. The API is designed around a secure direct-to-storage upload flow, so large files do not need to pass through your application servers.

At a high level, your application sends a document to Suparse, Suparse classifies and extracts the content, and your workflow receives clean structured data for downstream automation. The Python SDK, JavaScript SDK, CLI, and REST API all follow the same lifecycle. The full upload, polling, and export details are documented in Suparse Docs.

Built for Financial Document Automation

Our financial document API is designed for invoices, receipts, bank statements, purchase orders, and custom document types. It is especially useful when teams need more than basic OCR text: normalized fields, line items, transaction rows, page ranges, validation rules, and exports that are ready for accounting or operational workflows.

Common use cases include:

  • Accounts payable invoice capture
  • Receipt and expense automation
  • Bank statement transaction extraction
  • Purchase order and quote processing
  • Multi-document PDF splitting
  • Custom document parsing for internal workflows

For teams that want to test the output before integrating, Suparse also offers web workflows such as the invoice OCR converter and bank statement PDF to Excel converter.

Templates Without Template Lock-In

Templates define what data should be extracted. You can let Suparse auto-detect the document type, use a specific template for repeatable workflows, or enable splitting for multi-page PDFs that contain several documents.

That means you do not need to maintain a separate endpoint or integration path for every document type. The same platform can handle an invoice today, a bank statement tomorrow, and a custom vendor form when your operations team needs it.

For unique layouts, create a custom template in the Suparse UI or learn more about custom document parsing.

Supported upload MIME types are application/pdf, image/jpeg, image/png, image/heic, and image/heif. The current maximum file size is 20MB.

Integration Options

Suparse supports four practical integration paths:

  • Python SDK Best for backend services, data pipelines, local batch processing, notebooks, and automation scripts.
  • JavaScript and TypeScript SDK: Best for Next.js, React, Node.js services, edge runtimes, and browser-based upload workflows.
  • CLI Best for use with Claude Code, Claude Cowork, Codex and other LLM solutions to get repeatable document extraction according to your desired schema
  • REST API: Best when you want low-level control or are integrating from a language without an official SDK.

The SDKs handle upload, polling, result retrieval, retries, rate limits, and batch processing. The CLI is useful for quick tests, local folders, and operational scripts. REST remains available when your team wants direct protocol-level control.

For code-level instructions, install commands, and API reference, use Suparse Docs.

What You Get Back

The output is structured JSON designed for downstream automation. Depending on the template and document type, that can include document-level fields, tables, line items, transactions, page ranges, template IDs, and credits used.

When you need files instead of JSON, Suparse can export processed data as JSON, CSV, Excel, QuickBooks CSV, or Google Sheets. That makes the same extraction workflow useful for both product integrations and business teams that still need spreadsheet-based review.

Why Teams Use Suparse Instead of Building In-House

Document extraction looks simple until you have to support inconsistent PDFs, scans, rotated images, line items, multi-page documents, validation rules, retries, exports, and privacy requirements. Suparse packages those pieces into one platform so your team can focus on the workflow around the data.

That gives you:

  • A secure upload and processing lifecycle
  • Auto-detection for common document types
  • Custom templates for specialized layouts
  • Multi-document splitting for bundled PDFs
  • Validation-oriented extraction schemas
  • SDKs, CLI, and REST access
  • Cleanup options for privacy-sensitive workflows
  • Spreadsheet and accounting-friendly exports

Performance, Security, and Transparent Pricing

Integrating a third-party API should not mean opaque pricing, fragile code samples, confusing authentication, or weeks of implementation work. Suparse is built for teams that need reliable document automation without taking on the operational burden of maintaining an extraction platform.

You get a secure upload model, async processing for larger workloads, page-based usage, and SDKs for common integration environments. Start with 50 free pages, review the documentation, and use the pricing tiers to estimate costs before scaling high-volume workflows.

Ready to Build? Get Your Free API Key.

Go from signup to your first JSON response in under 5 minutes. Get 50 free pages, no credit card required.

Get Your Free API Key & 50 Pages Free

Developer FAQ: Your Technical Questions Answered

What are the API rate limits?

How do I handle authentication?

What file formats do you support via the API?

How do I know when my document is done processing?

How is the API versioned?

Can I process multiple documents in a single API call (batch processing)?

How do you handle different languages and currencies?

Profile picture of Michal Raczy

Michal Raczy

Michal is the founder of Suparse.com. He has over 15 years of experience in delivering projects in data analysis, automation, and document processing. Michal solves complex automation and AI implementation challenges for both SMEs and large corporations, with a particular focus on document processing. Contact at michal@suparse.com.