Suparse Python SDK

Official Python SDK and CLI for the Suparse Document Processing API.

Suparse is an AI-powered document processing API for extracting structured data from any document type, including invoices, receipts, bank statements, purchase orders and many more. This SDK wraps the REST API with a Python client and CLI.

Requirements

Python 3.10+
Dependencies: httpx, pydantic, pydantic-settings, tenacity

Installation

pip install suparse

Authentication

You'll need an API key to use the SDK or CLI. To obtain one:

Sign in at suparse.com
Go to the API Keys tab
Enter a key name and click Generate New Key
Copy the key value — it will be shown only once

Set it as an environment variable:

export SUPARSE_API_KEY="your_api_key_here"

For the CLI, you can also store it in ~/.config/suparse/config.json:

{
  "apiKey": "your_api_key_here"
}

Or pass it directly to the SDK constructor (see below).

Quick Start

# CLI
suparse process invoice.pdf -o results.json
 
# Python — synchronous (with SUPARSE_API_KEY env var set)
python3 -c "
from suparse import SuparseClient
 
with SuparseClient() as client:
    result = client.extract('invoice.pdf')
    for r in result.succeeded:
        print(r.original_file, r.data)
"
 
# Python — asynchronous
python3 -c "
import asyncio
from suparse import AsyncSuparseClient
 
async def main():
    async with AsyncSuparseClient() as client:
        result = await client.extract('invoice.pdf')
        for r in result.succeeded:
            print(r.original_file, r.data)
 
asyncio.run(main())
"

CLI Usage

Run suparse --help or suparse process --help for full usage information.

Set your API key and process a file directly from the terminal. The SDK will auto-upload, poll, and download the selected output format.

Process a Document

export SUPARSE_API_KEY="your_api_key_here"
 
# Auto-detect template
suparse process path/to/invoice.pdf -o results.json
 
# Use specific template without auto-splitting
suparse process path/to/invoice.pdf --template-id 276a0aa8-84bc-4491-a2e7-1ea13381790c
 
# Auto-split a multi-page PDF containing mixed document types (e.g. receipts + bank statements)
suparse process path/to/merged.pdf --with-split
 
# Use specific template with auto-splitting
suparse process path/to/invoice.pdf --template-id 276a0aa8-84bc-4491-a2e7-1ea13381790c --with-split
 
# Process and auto-delete documents from server after download
suparse process path/to/invoice.pdf --cleanup
 
# Export directly to XLSX
suparse process path/to/invoice.pdf --format xlsx
 
# Export CSV using original template columns
suparse process path/to/invoice.pdf --format csv --export-type original

Process a Folder

Process all supported files (.pdf, .jpg, .jpeg, .png, .heic, .heif) in a folder. All files are uploaded and polled individually, then results are exported to JSON by default or to the selected file export format.

# Process all supported files in a folder
suparse process --folder path/to/receipts/
 
# Output to a specific file (default: {folder_name}_results.json)
suparse process --folder path/to/receipts/ -o all_results.json
 
# Process a folder with a specific template
suparse process --folder path/to/receipts/ --template-id 276a0aa8-84bc-4491-a2e7-1ea13381790c
 
# Process a folder with auto-splitting enabled
suparse process --folder path/to/receipts/ --with-split
 
# Process a folder and auto-delete documents from server after download
suparse process --folder path/to/receipts/ --cleanup
 
# Export a folder to CSV, XLSX, or Google Sheets
suparse process --folder path/to/receipts/ --format csv
suparse process --folder path/to/receipts/ --format xlsx -o ./exports
suparse process --folder path/to/receipts/ --format google_sheets

Delete Documents

# Delete one or more documents by ID (prompts for confirmation)
suparse delete <document_id>
suparse delete <id1> <id2> <id3>
 
# Skip confirmation prompt
suparse delete <id1> <id2> -y

Deleting a parent document automatically deletes all its child documents (server-side cascade).

List Available Templates

Templates define how a document type (invoice, receipt, bank statement, etc.) is parsed. Before processing a document, check which templates are already assigned to your account:

# List templates assigned to your account (table format)
suparse templates
 
# List templates in JSON format
suparse templates --format json
 
# Include all system templates (not yet assigned to your account)
suparse templates --include-system

The recommended way to process documents is with auto-split enabled (--with-split), which handles both single-type and mixed document types automatically:

suparse process path/to/documents.pdf --with-split -o results.json

If you consistently process one document type, look up the template ID and pass it directly:

Run suparse templates to see templates assigned to your account.

Use the matching template ID:

suparse process invoice.pdf --template-id <id> -o results.json

If no template matches, run suparse templates --include-system to browse all system templates. Assign one to your account via the Suparse UI.
If no system template fits, create a custom template using the template creator in the Suparse UI.

Configuration

The CLI reads settings from environment variables or a .env file in the working directory:

Variable	Default	Description
`SUPARSE_API_URL`	`https://api.suparse.com/api/v1/`	API base URL
`SUPARSE_API_KEY`	—	Your API key (required)
`POLL_INTERVAL`	`5`	Seconds between polling attempts
`MAX_POLL_ATTEMPTS`	`300`	Max polling attempts before timeout
`LOG_LEVEL`	`INFO`	Logging level

Priority: CLI flags > environment variables > .env file > defaults.

For API keys, the CLI also checks ~/.config/suparse/config.json after SUPARSE_API_KEY and .env. The config file should contain a JSON object with an apiKey string.

Global Options

These options apply to all subcommands.

Option	Description
`--api-url`	API URL (default: from `SUPARSE_API_URL` env var)
`--api-key`	API Key (default: from `SUPARSE_API_KEY` env var)
`-v`, `--verbose`	Enable verbose (DEBUG) output

CLI Options

Command	Option	Description
`process`	`--folder`	Process all supported files in a folder (mutually exclusive with file path)
`process`	`-o`, `--output`	Output file path, or output directory for file exports when omitted
`process`	`--template-id`	Template ID to use (default: auto-detect)
`process`	`--with-split`	Auto-split multi-page PDFs containing mixed document types (e.g. receipts + bank statements) so each is processed with the correct template (default: off)
`process`	`--cleanup`	Delete documents from server after download so files are stored only during processing and cannot be accessed later (default: off)
`process`	`--format`	Export format: `json`, `csv`, `xlsx`, or `google_sheets` (default: `json`)
`process`	`--export-type`	Export mode for CSV, XLSX, and Google Sheets: `unified` or `original` (default: `unified`)
`templates`	`--format`	Output format: `table` or `json` (default: `table`)
`templates`	`--include-system`	Include system templates not yet assigned to your account (default: off)
`delete`	`-y`, `--yes`	Skip confirmation prompt

Python SDK Usage

The SDK provides both synchronous and asynchronous clients. Both handle API rate limits, parallel processing, and connection pooling automatically.

Synchronous (Recommended for Scripts, Pandas, Jupyter Notebooks)

from suparse import SuparseClient
 
with SuparseClient() as client:
    result = client.extract(["invoice1.pdf", "invoice2.pdf"])
 
    for r in result.succeeded:
        print(r.original_file, r.data)

Asynchronous (Recommended for FastAPI, Aiohttp, Async Pipelines)

import asyncio
from suparse import AsyncSuparseClient
 
async def main():
    async with AsyncSuparseClient() as client:
        result = await client.extract("invoice.pdf")
        for r in result.succeeded:
            print(r.original_file, r.data)
 
asyncio.run(main())

The client reads SUPARSE_API_KEY and SUPARSE_API_URL from environment variables by default, so you can simply use SuparseClient() or AsyncSuparseClient() if those are set.

Extract One or More Documents

extract() is the primary API. It accepts a single file, a list of files, or any iterable (like Path.glob). Each input can be a string path, a Path object, or an open file handle.

Synchronous

from pathlib import Path
from suparse import SuparseClient
 
with SuparseClient() as client:
    # Single file (string or Path)
    result = client.extract("invoice.pdf")
    for r in result.succeeded:
        print(r.original_file, r.data)
 
    # List of files from different locations
    result = client.extract([
        "./receipts/jan.pdf",
        Path("./receipts/feb.pdf"),
    ])
 
    # Glob generator
    result = client.extract(
        Path("./receipts/").glob("*.pdf")
    )
 
    # With progress callback
    def print_progress(r):
        if isinstance(r, FailedResult):
            print(f"  Failed: {r.file} - {r.error}")
        else:
            print(f"  Done: {r.original_file}")
 
    result = client.extract(
        files=Path("./receipts/").glob("*.pdf"),
        on_progress=print_progress,
    )
 
    # Access results
    for r in result.succeeded:
        print(r.original_file, r.data, r.document_ids)
 
    for f in result.failed:
        print(f.file, f.error)
 
    print(f"Total: {result.total}")

Asynchronous

import asyncio
from pathlib import Path
from suparse import AsyncSuparseClient
 
async def main():
    async with AsyncSuparseClient() as client:
        # Single file (string or Path)
        result = await client.extract("invoice.pdf")
        for r in result.succeeded:
            print(r.original_file, r.data)
 
        # List of files from different locations
        result = await client.extract([
            "./receipts/jan.pdf",
            Path("./receipts/feb.pdf"),
        ])
 
        # Glob generator
        result = await client.extract(
            Path("./receipts/").glob("*.pdf")
        )
 
        # With progress callback
        def print_progress(r):
            if isinstance(r, FailedResult):
                print(f"  Failed: {r.file} - {r.error}")
            else:
                print(f"  Done: {r.original_file}")
 
        result = await client.extract(
            files=Path("./receipts/").glob("*.pdf"),
            on_progress=print_progress,
        )
 
        # Access results
        for r in result.succeeded:
            print(r.original_file, r.data, r.document_ids)
 
        for f in result.failed:
            print(f.file, f.error)
 
        print(f"Total: {result.total}")

Extract a Folder

extract_folder() is a convenience wrapper that discovers supported files and delegates to extract().

Synchronous

from suparse import SuparseClient
 
with SuparseClient() as client:
    result = client.extract_folder(
        "./receipts/",
        template_id="276a0aa8-84bc-4491-a2e7-1ea13381790c",
        split=True,
        cleanup=True,
    )
    for r in result.succeeded:
        print(r.original_file, r.data)

Asynchronous

import asyncio
from suparse import AsyncSuparseClient
 
async def main():
    async with AsyncSuparseClient() as client:
        result = await client.extract_folder(
            "./receipts/",
            template_id="276a0aa8-84bc-4491-a2e7-1ea13381790c",
            split=True,
            cleanup=True,
        )
        for r in result.succeeded:
            print(r.original_file, r.data)
 
asyncio.run(main())

Note: r.data contains only the extracted document fields. To get all details — including credits_used, template_id, document_id, file_name, page_start, and page_end — use r.documents (a list of DocumentExport objects). To serialize everything, use r.model_dump(mode="json").

Error Handling

All exceptions inherit from SuparseError. Handle them at whatever granularity makes sense:

Synchronous

from suparse import SuparseClient
from suparse.exceptions import (
    SuparseError,
    SuparseAuthError,
    SuparseNetworkError,
    SuparsePollingTimeoutError,
)
 
with SuparseClient(api_key="your_api_key_here") as client:
    try:
        result = client.extract("invoice.pdf")
    except SuparseAuthError:
        print("Invalid API key")
    except SuparseNetworkError:
        print("Connection failed")
    except SuparsePollingTimeoutError:
        print("Processing timed out")
    except SuparseError as e:
        print(f"Unexpected error: {e}")

Asynchronous

import asyncio
from suparse import AsyncSuparseClient
from suparse.exceptions import (
    SuparseError,
    SuparseAuthError,
    SuparseNetworkError,
    SuparsePollingTimeoutError,
)
 
async def main():
    async with AsyncSuparseClient(api_key="your_api_key_here") as client:
        try:
            result = await client.extract("invoice.pdf")
        except SuparseAuthError:
            print("Invalid API key")
        except SuparseNetworkError:
            print("Connection failed")
        except SuparsePollingTimeoutError:
            print("Processing timed out")
        except SuparseError as e:
            print(f"Unexpected error: {e}")
 
asyncio.run(main())

Pass file objects or in-memory streams directly (no Path needed):

    with open("invoice.pdf", "rb") as f:
        result = client.extract(f)

For batch operations, check result.failed to handle per-file errors without catching exceptions:

result = client.extract(["a.pdf", "b.pdf", "c.pdf"])
 
for r in result.succeeded:
    print(f"OK: {r.original_file} -> {r.document_ids}")
 
for f in result.failed:
    print(f"FAIL: {f.file} -> {f.error}")

Low-Level API

For cases where you need file-based output or direct control over the upload/poll/download cycle:

Synchronous

from pathlib import Path
from suparse import ExportFormat, SuparseClient
 
with SuparseClient() as client:
    # Upload a file and get back a task ID
    task_id = client.upload_file(
        Path("invoice.pdf"),
        template_id="276a0aa8-84bc-4491-a2e7-1ea13381790c",
        split=False,
        auto_approve=True,  # Set to False to require human review in the Suparse UI
    )
 
    # Poll until processing completes (returns status + document IDs)
    status, doc_ids = client.poll_task_status(task_id)
 
    # Export results by document IDs to a JSON file. Omitting the output path
    # uses the API filename or a timestamped fallback in the current directory.
    saved_path = client.download_results(
        ["doc-id-1", "doc-id-2"],
        Path("output.json"),
    )
 
    # Fetch non-JSON exports in memory
    csv_export = client.fetch_results(
        ["doc-id-1", "doc-id-2"],
        format=ExportFormat.CSV,
    )
    print(csv_export.filename, csv_export.content_type, csv_export.is_zip)
 
    # Delete documents by ID
    client.delete_documents([
        "550e8400-e29b-41d4-a716-446655440000",
    ])
 
    # List available templates
    templates = client.list_templates()
    for t in templates:
        print(f"{t.name} ({t.template_language})")

Asynchronous

import asyncio
from pathlib import Path
from suparse import AsyncSuparseClient, ExportFormat, ExportType
 
async def main():
    async with AsyncSuparseClient() as client:
        # Process a single document to a JSON file
        success = await client.process_document(
            file_path=Path("invoice.pdf"),
            output_path=Path("results.json"),
            cleanup=True,
        )
 
        # Process multiple files in parallel (returns raw tuples)
        succeeded, failed = await client.process_batch(
            [Path("a.pdf"), Path("b.pdf")],
            template_id="276a0aa8-84bc-4491-a2e7-1ea13381790c",
        )
        # succeeded: list of (Path, task_id, [document_ids])
        # failed: list of (Path, task_id or None, exception)
 
        # Upload a file and get back a task ID
        task_id = await client.upload_file(
            Path("invoice.pdf"),
            template_id="276a0aa8-84bc-4491-a2e7-1ea13381790c",
            split=False,
            auto_approve=True,  # Set to False to require human review in the Suparse UI
        )
 
        # Poll until processing completes (returns status + document IDs)
        status, doc_ids = await client.poll_task_status(task_id)
 
        # Export results by document IDs to a JSON file
        saved_path = await client.download_results(
            ["doc-id-1", "doc-id-2"],
            Path("output.json"),
        )
 
        # Save an XLSX export to a directory using the API filename if present
        saved_path = await client.download_results(
            ["doc-id-1", "doc-id-2"],
            Path("./exports"),
            format=ExportFormat.XLSX,
            export_type=ExportType.UNIFIED,
        )
 
        # Delete documents by ID
        await client.delete_documents([
            "550e8400-e29b-41d4-a716-446655440000",
        ])
 
        # List available templates
        templates = await client.list_templates()
        for t in templates:
            print(f"{t.name} ({t.template_language})")
 
asyncio.run(main())

Constructor Parameters

Both SuparseClient and AsyncSuparseClient accept the same parameters:

Parameter	Type	Default	Description
`api_url`	`str`	`SUPARSE_API_URL` env var or `https://api.suparse.com/api/v1/`	API base URL
`api_key`	`str`	`SUPARSE_API_KEY` env var	Your API key (required)
`poll_interval`	`int`	`5`	Seconds between polling attempts
`max_poll_attempts`	`int`	`300`	Max polling attempts before timeout

Result Objects

Object	Properties	Description
`TaskExport`	`task_id`, `original_file`, `total_documents_extracted`, `documents`, `data`, `document_ids`	Successfully extracted file; `documents` contains `DocumentExport` objects with `credits_used`, `template_id`, etc.
`FailedResult`	`file`, `error`	File that failed during extraction
`BatchResult`	`succeeded`, `failed`, `total`	Container for batch results; access `.succeeded` and `.failed` explicitly
`ExportResult`	`format`, `content_type`, `filename`, `is_zip`, `data`, `task_exports`, `google_sheets_export`	Low-level export result for JSON, CSV, XLSX, and Google Sheets

Export Formats

Format	Result	Notes
`json`	`TaskExport` list	Default structured extraction result
`csv`	Binary export bytes	Uses API filename when provided; may be a ZIP
`xlsx`	Binary export bytes	Uses API filename when provided; may be a ZIP
`google_sheets`	`GoogleSheetsExport`	Requires Google Sheets integration

export_type accepts unified or original and defaults to unified. It affects CSV, XLSX, and Google Sheets exports; JSON remains the task-oriented extraction result.

Exceptions

All exceptions inherit from SuparseError.

Exception	Raised When
`SuparseError`	Base exception for all SDK errors
`SuparseNetworkError`	Network connection fails or times out
`SuparsePollingTimeoutError`	Polling exceeds `max_poll_attempts`
`SuparseProcessingError`	Document fails to process on the server
`SuparseAPIError`	Base for HTTP error responses (has `status_code`, `response_body`)
`SuparseAuthError`	401/403 authentication or authorization error
`SuparseNotFoundError`	404 resource not found
`SuparseRateLimitError`	429 too many requests
`SuparseServerError`	5xx server error
`SuparseSDKError`	SDK fails to parse the API response into the expected data model

`extract()` Parameters

Parameter	Type	Default	Description
`files`	`str`, `Path`, `IO[bytes]`, or iterable of these	required	One or more files to process
`template_id`	`str`	`None`	Template ID (auto-detect if omitted)
`split`	`bool`	`False`	Auto-split multi-page documents
`auto_approve`	`bool`	`True`	Set to `False` to require human review in the Suparse UI
`cleanup`	`bool`	`False`	Delete documents from server after extraction
`on_progress`	`callable`	`None`	Called with each `TaskExport` or `FailedResult` as it completes

`extract_folder()` Parameters

Parameter	Type	Default	Description
`folder`	`str` or `Path`	required	Directory to scan
`pattern`	`str`	`"*"`	Glob pattern for file discovery (filtered by supported extensions)
`template_id`	`str`	`None`	Template ID (auto-detect if omitted)
`split`	`bool`	`False`	Auto-split multi-page documents
`auto_approve`	`bool`	`True`	Set to `False` to require human review in the Suparse UI
`cleanup`	`bool`	`False`	Delete documents from server after extraction
`on_progress`	`callable`	`None`	Called with each result as it completes

Suparse Python SDK

Requirements

Installation

Authentication

Quick Start

CLI Usage

Process a Document

Process a Folder

Delete Documents

List Available Templates

Configuration

Global Options

CLI Options

Python SDK Usage

Synchronous (Recommended for Scripts, Pandas, Jupyter Notebooks)

Asynchronous (Recommended for FastAPI, Aiohttp, Async Pipelines)

Extract One or More Documents

Synchronous

Asynchronous

Extract a Folder

Synchronous

Asynchronous

Error Handling

Synchronous

Asynchronous

Low-Level API

Synchronous

Asynchronous

Constructor Parameters

Result Objects

Export Formats

Exceptions

extract() Parameters

extract_folder() Parameters

`extract()` Parameters

`extract_folder()` Parameters