Suparse

The Ultimate Guide to Suparse's Document Extraction API for Developers

Profile picture of Michael SwiftMichael Swift
September 11, 20255 min read
api
developers
data extraction
python
nodejs
The Ultimate Guide to Suparse's Document Extraction API for Developers

In this guide we'll show you how to integrate a world-class document extraction API and go from signup to your first structured JSON response in under 5 minutes. No sales calls, no complex setup—just clean, reliable data.

Understanding the Suparse REST API for Data Extraction

Our API is designed to be asynchronous to handle complex documents without tying up your application. The flow is simple:

  1. POST a document to an endpoint to begin processing.
  2. The API immediately returns a document_id.
  3. GET the structured JSON data by polling the result endpoint using the document_id.

Core Endpoints & Pre-Built Processors

Our REST API for data extraction uses pre-trained models for common financial documents. You don't need to define templates. Just specify the document type in the upload URL:

  • POST /api/v1/documents/invoice: Our powerful invoice OCR API.
  • POST /api/v1/documents/receipt: For point-of-sale receipts.
  • POST /api/v1/documents/bank_statement: A robust bank statement parser API.

You can find all endpoints and parameters in our full API documentation.

A Practical Python Example

Here’s how to upload a local PDF file and retrieve its data using Python and the requests library. This script shows how to parse a PDF bank statement with Python, a common use case. (If you're looking for a quick, no-code way to test the output, you can also convert your bank statement PDF to Excel directly in our web app.)

import requests
import time
import os
import json
 
API_KEY = os.environ.get("SUPARSE_API_KEY") # Recommended: use environment variables
BASE_URL = "https://api.suparse.com/api/v1"
HEADERS = {"X-API-Key": API_KEY}
FILE_PATH = "path/to/your/bank-statement.pdf"
DOC_TYPE = "bank_statement"
 
def upload_document(file_path, doc_type):
"""Uploads a document and returns its ID."""
print(f"Uploading {file_path}...")
url = f"{BASE_URL}/documents/{doc_type}"
 
    with open(file_path, "rb") as f:
        files = {"file": (os.path.basename(file_path), f)}
        response = requests.post(url, headers=HEADERS, files=files)
 
    if response.status_code == 202:
        data = response.json()
        print(f"Upload successful. Document ID: {data['document_id']}")
        return data['document_id']
    else:
        print(f"Error during upload: {response.status_code} {response.text}")
        return None
 
def poll_for_result(document_id):
"""Polls for the processing result of a document based on HTTP status codes."""
url = f"{BASE_URL}/documents/{document_id}/result"
 
    # Best practice: start polling after an initial delay
    print("Waiting 5 seconds before first poll...")
    time.sleep(5)
 
    max_attempts = 20
    delay = 3 # Subsequent delay
 
    for attempt in range(max_attempts):
        print(f"Polling for result (Attempt {attempt + 1}/{max_attempts})...")
        response = requests.get(url, headers=HEADERS)
 
        if response.status_code == 200:
            print("Processing complete!")
            return response.json()
        elif response.status_code == 202:
            print(f"Status is 'processing', waiting for {delay} seconds...")
            time.sleep(delay)
        else:
            print(f"Polling failed - status {response.status_code}: {response.text}")
            return None
 
    print("Polling timed out after maximum attempts.")
    return None
 
if name == "main":
if not API_KEY:
print("Error: SUPARSE_API_KEY environment variable not set.")
else:
doc_id = upload_document(FILE_PATH, DOC_TYPE)
if doc_id:
result = poll_for_result(doc_id)
if result:
print("\n--- Extracted Data ---")
print(json.dumps(result, indent=2))

A Practical Node.js Example

Here is the same workflow using Node.js with axios and form-data.

const axios = require("axios");
const fs = require("fs");
const FormData = require("form-data");
const path = require("path");
 
const API_KEY = process.env.SUPARSE_API_KEY; // Set your key as an env var
const BASE_URL = "https://api.suparse.com/api/v1";
const FILE_PATH = "path/to/your/invoice.pdf";
const DOC_TYPE = "invoice";
 
const sleep = (ms) => new Promise((resolve) => setTimeout(resolve, ms));
 
const pollForResult = async (documentId) => {
  const url = `${BASE_URL}/documents/${documentId}/result`;
  const headers = { "X-API-Key": API_KEY };
 
  const maxAttempts = 20;
 
  // Start with a 5-second delay, then switch to 3-second intervals
  await sleep(5000);
 
  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    console.log(
      `Polling for result (Attempt ${attempt + 1}/${maxAttempts})...`
    );
    try {
      // We expect a 200 on success. Axios will throw an error for other statuses.
      const response = await axios.get(url, { headers });
      console.log("Processing complete!");
      console.log(JSON.stringify(response.data, null, 2));
      return; // Exit loop on success
    } catch (error) {
      // Check if the error is due to a 202 "still processing" status
      if (error.response && error.response.status === 202) {
        console.log("Status is 'processing', waiting for 3 seconds...");
        await sleep(3000);
      } else {
        // Handle all other errors (404, 500, etc.)
        const status = error.response ? error.response.status : "Unknown";
        const errorData = error.response ? error.response.data : error.message;
        console.error(`Polling failed with status ${status}:`, errorData);
        return; // Exit loop on failure
      }
    }
  }
  console.error("Polling timed out after maximum attempts.");
};
 
const processDocument = async () => {
  if (!API_KEY) {
    console.error("Error: SUPARSE_API_KEY environment variable not set.");
    return;
  }
 
  const form = new FormData();
  form.append("file", fs.createReadStream(FILE_PATH));
 
  try {
    const url = `${BASE_URL}/documents/${DOC_TYPE}`;
    const response = await axios.post(url, form, {
      headers: {
        ...form.getHeaders(),
        "X-API-Key": API_KEY,
      },
    });
 
    const { document_id } = response.data;
    console.log(`Document uploaded successfully. ID: ${document_id}`);
    await pollForResult(document_id);
  } catch (error) {
    const status = error.response ? error.response.status : "Unknown";
    const errorData = error.response ? error.response.data : error.message;
    console.error(`Error uploading file with status ${status}:`, errorData);
  }
};
 
processDocument();

Making Sense of the JSON Response

Clean, predictable JSON is our promise. When you use our PDF bank statement JSON API, you get a deeply nested object with all the data you need, already normalized and validated.

Here's a condensed look at what you can expect from a successful 200 OK response:

{
  "document_id": "a1b2c3d4-e5f6-7890-1234-567890abcdef",
  "extraction_data": {
    "docType": "Invoice",
    "general": {
      "InvoiceNumber": "INV-2023-001",
      "TotalGrossPrice": 120.5,
      "Currency": "USD"
    },
    "lineItems": [
      {
        "ProductName": "Product A",
        "ProductQuantity": 1,
        "ProductUnitPrice": 100.0
      }
    ]
  },
  "validation": {
    "passed": true,
    "details": {
      "calculation_logic": "sum(lineItems.ProductPrice) vs general.TotalNetPrice"
    }
  },
  "credits_used": 1,
  "passed_validation": true
}

Notice how the extraction_data object contains structured information, and the validation field provides confidence in the result. This saves you hours of post-processing and data cleaning.

Performance, Security, and Transparent Pricing

Integrating a third-party API shouldn't be a painful alternative, plagued by opaque pricing, broken code examples, and confusing authentication. This wastes valuable development cycles and delays your launch.

Integrating an API is a long-term partnership. Here’s our commitment to you:

  • Performance: Our low latency OCR API is built on enterprise-grade cloud infrastructure, designed for high-volume, parallel processing. You can automate your entire invoice workflow without worrying about bottlenecks.
  • Security: We take data protection seriously. All data is encrypted end-to-end (in transit and at rest), and our platform is designed to be GDPR and CCPA compliant.
  • Pricing: No "contact us for a quote" games. We offer a usage-based pricing OCR API model that's simple and predictable. Start for free with 20 pages, and scale as you grow. See our full pricing tiers.

You're not just buying an API; you're investing in a reliable, scalable, and secure data extraction backbone for your application.

Ready to Build? Get Your Free API Key.

Go from signup to your first JSON response in under 5 minutes. Get 20 free pages, no credit card required.

Get Your Free API Key & 20 Credits

Developer FAQ: Your Technical Questions Answered

What are the API rate limits?

Our standard API plan includes generous rate limits suitable for most production applications. For high-volume or enterprise needs, we offer custom plans. Please see our pricing page or contact us for details.

How do I handle authentication?

Authentication is simple. Include your unique API key in the `X-API-Key` header of every request. You can generate and manage your keys from your account dashboard.

What file formats do you support via the API?

The API accepts PDF, JPEG, and PNG files. We recommend PDFs for the highest accuracy and reliability.

How do I know when my document is done processing?

Poll the `GET /api/v1/documents/{document_id}/result` endpoint. An HTTP `202 Accepted` response means the document is still processing. An HTTP `200 OK` response means processing is complete, and the response body will contain your extracted JSON data.

How is the API versioned?

Our API uses URL versioning (e.g., `/api/v1/`). We guarantee backward compatibility for all minor updates. Any breaking changes will be released as a new version (`/v2/`), with a long deprecation window for the older version.

Can I process multiple documents in a single API call (batch processing)?

While each document is uploaded via a unique API call, our system is built for high-throughput batch processing. You can send hundreds of asynchronous requests in parallel to process large volumes of documents quickly.

How do you handle different languages and currencies?

Our AI models automatically detect and process over 50 languages and all global currencies. The extracted data is normalized for consistency, such as standardizing date formats to YYYY-MM-DD.

Profile picture of Michael Swift

Michael Swift

Michael has over 15 years of experience in AI, Document Processing and Data Analytics for top financial institutions. Michael is on a mission to eliminate manual data entry. His work focuses on building intelligent, template-free solutions for invoice and bank statement data extraction, helping boost efficiency and accuracy. Michael has solved hard document processing and conversion problems both for SMEs and large corporations, including invoice and bank statement automation. Now Michael is bringing these solutions with help of AI to everyone as and affordable solution - Suparse.