Suparse

Python SDK & CLI

Last updated: 2026-05-15

Suparse Python SDK

Official Python SDK and CLI for the Suparse Document Processing API.

Suparse is an AI-powered document processing API for extracting structured data from any document type, including invoices, receipts, bank statements, purchase orders and many more. This SDK wraps the REST API with a Python client and CLI.

Requirements

  • Python 3.10+
  • Dependencies: httpx, pydantic, pydantic-settings, tenacity

Installation

pip install suparse

Authentication

You'll need an API key to use the SDK or CLI. To obtain one:

  1. Sign in at suparse.com
  2. Go to the API Keys tab
  3. Enter a key name and click Generate New Key
  4. Copy the key value — it will be shown only once

Set it as an environment variable:

export SUPARSE_API_KEY="your_api_key_here"

Or pass it directly to the SDK constructor (see below).

Quick Start

# CLI
suparse process invoice.pdf -o results.json
 
# Python — synchronous (with SUPARSE_API_KEY env var set)
python3 -c "
from suparse import SuparseClient
 
with SuparseClient() as client:
    result = client.extract('invoice.pdf')
    for r in result.succeeded:
        print(r.original_file, r.data)
"
 
# Python — asynchronous
python3 -c "
import asyncio
from suparse import AsyncSuparseClient
 
async def main():
    async with AsyncSuparseClient() as client:
        result = await client.extract('invoice.pdf')
        for r in result.succeeded:
            print(r.original_file, r.data)
 
asyncio.run(main())
"

CLI Usage

Run suparse --help or suparse process --help for full usage information.

Set your API key and process a file directly from the terminal. The SDK will auto-upload, poll, and download the resulting JSON.

Process a Document

export SUPARSE_API_KEY="your_api_key_here"
 
# Auto-detect template
suparse process path/to/invoice.pdf -o results.json
 
# Use specific template without auto-splitting
suparse process path/to/invoice.pdf --template-id 276a0aa8-84bc-4491-a2e7-1ea13381790c
 
# Auto-split a multi-page PDF containing mixed document types (e.g. receipts + bank statements)
suparse process path/to/merged.pdf --with-split
 
# Use specific template with auto-splitting
suparse process path/to/invoice.pdf --template-id 276a0aa8-84bc-4491-a2e7-1ea13381790c --with-split
 
# Process and auto-delete documents from server after download
suparse process path/to/invoice.pdf --cleanup

Process a Folder

Process all supported files (.pdf, .jpg, .jpeg, .png, .heic, .heif) in a folder. All files are uploaded and polled individually, then results are exported to a single JSON file.

# Process all supported files in a folder
suparse process --folder path/to/receipts/
 
# Output to a specific file (default: {folder_name}_results.json)
suparse process --folder path/to/receipts/ -o all_results.json
 
# Process a folder with a specific template
suparse process --folder path/to/receipts/ --template-id 276a0aa8-84bc-4491-a2e7-1ea13381790c
 
# Process a folder with auto-splitting enabled
suparse process --folder path/to/receipts/ --with-split
 
# Process a folder and auto-delete documents from server after download
suparse process --folder path/to/receipts/ --cleanup

Delete Documents

# Delete one or more documents by ID (prompts for confirmation)
suparse delete <document_id>
suparse delete <id1> <id2> <id3>
 
# Skip confirmation prompt
suparse delete <id1> <id2> -y

Deleting a parent document automatically deletes all its child documents (server-side cascade).

List Available Templates

Templates define how a document type (invoice, receipt, bank statement, etc.) is parsed. Before processing a document, check which templates are already assigned to your account:

# List templates assigned to your account (table format)
suparse templates
 
# List templates in JSON format
suparse templates --format json
 
# Include all system templates (not yet assigned to your account)
suparse templates --include-system

The recommended way to process documents is with auto-split enabled (--with-split), which handles both single-type and mixed document types automatically:

suparse process path/to/documents.pdf --with-split -o results.json

If you consistently process one document type, look up the template ID and pass it directly:

  1. Run suparse templates to see templates assigned to your account.
  2. Use the matching template ID:
    suparse process invoice.pdf --template-id <id> -o results.json
  3. If no template matches, run suparse templates --include-system to browse all system templates. Assign one to your account via the Suparse UI.
  4. If no system template fits, create a custom template using the template creator in the Suparse UI.

Configuration

The CLI reads settings from environment variables or a .env file in the working directory:

VariableDefaultDescription
SUPARSE_API_URLhttps://api.suparse.com/api/v1/API base URL
SUPARSE_API_KEYYour API key (required)
POLL_INTERVAL5Seconds between polling attempts
MAX_POLL_ATTEMPTS300Max polling attempts before timeout
LOG_LEVELINFOLogging level

Priority: CLI flags > environment variables > .env file > defaults.

Global Options

These options apply to all subcommands.

OptionDescription
--api-urlAPI URL (default: from SUPARSE_API_URL env var)
--api-keyAPI Key (default: from SUPARSE_API_KEY env var)
-v, --verboseEnable verbose (DEBUG) output

CLI Options

CommandOptionDescription
process--folderProcess all supported files in a folder (mutually exclusive with file path)
process-o, --outputOutput JSON file path (default: {file_stem}_results.json, folder mode: {folder_name}_results.json)
process--template-idTemplate ID to use (default: auto-detect)
process--with-splitAuto-split multi-page PDFs containing mixed document types (e.g. receipts + bank statements) so each is processed with the correct template (default: off)
process--cleanupDelete documents from server after download so files are stored only during processing and cannot be accessed later (default: off)
templates--formatOutput format: table or json (default: table)
templates--include-systemInclude system templates not yet assigned to your account (default: off)
delete-y, --yesSkip confirmation prompt

Python SDK Usage

The SDK provides both synchronous and asynchronous clients. Both handle API rate limits, parallel processing, and connection pooling automatically.

from suparse import SuparseClient
 
with SuparseClient() as client:
    result = client.extract(["invoice1.pdf", "invoice2.pdf"])
 
    for r in result.succeeded:
        print(r.original_file, r.data)
import asyncio
from suparse import AsyncSuparseClient
 
async def main():
    async with AsyncSuparseClient() as client:
        result = await client.extract("invoice.pdf")
        for r in result.succeeded:
            print(r.original_file, r.data)
 
asyncio.run(main())

The client reads SUPARSE_API_KEY and SUPARSE_API_URL from environment variables by default, so you can simply use SuparseClient() or AsyncSuparseClient() if those are set.

Extract One or More Documents

extract() is the primary API. It accepts a single file, a list of files, or any iterable (like Path.glob). Each input can be a string path, a Path object, or an open file handle.

Synchronous

from pathlib import Path
from suparse import SuparseClient
 
with SuparseClient() as client:
    # Single file (string or Path)
    result = client.extract("invoice.pdf")
    for r in result.succeeded:
        print(r.original_file, r.data)
 
    # List of files from different locations
    result = client.extract([
        "./receipts/jan.pdf",
        Path("./receipts/feb.pdf"),
    ])
 
    # Glob generator
    result = client.extract(
        Path("./receipts/").glob("*.pdf")
    )
 
    # With progress callback
    def print_progress(r):
        if isinstance(r, FailedResult):
            print(f"  Failed: {r.file} - {r.error}")
        else:
            print(f"  Done: {r.original_file}")
 
    result = client.extract(
        files=Path("./receipts/").glob("*.pdf"),
        on_progress=print_progress,
    )
 
    # Access results
    for r in result.succeeded:
        print(r.original_file, r.data, r.document_ids)
 
    for f in result.failed:
        print(f.file, f.error)
 
    print(f"Total: {result.total}")

Asynchronous

import asyncio
from pathlib import Path
from suparse import AsyncSuparseClient
 
async def main():
    async with AsyncSuparseClient() as client:
        # Single file (string or Path)
        result = await client.extract("invoice.pdf")
        for r in result.succeeded:
            print(r.original_file, r.data)
 
        # List of files from different locations
        result = await client.extract([
            "./receipts/jan.pdf",
            Path("./receipts/feb.pdf"),
        ])
 
        # Glob generator
        result = await client.extract(
            Path("./receipts/").glob("*.pdf")
        )
 
        # With progress callback
        def print_progress(r):
            if isinstance(r, FailedResult):
                print(f"  Failed: {r.file} - {r.error}")
            else:
                print(f"  Done: {r.original_file}")
 
        result = await client.extract(
            files=Path("./receipts/").glob("*.pdf"),
            on_progress=print_progress,
        )
 
        # Access results
        for r in result.succeeded:
            print(r.original_file, r.data, r.document_ids)
 
        for f in result.failed:
            print(f.file, f.error)
 
        print(f"Total: {result.total}")

Extract a Folder

extract_folder() is a convenience wrapper that discovers supported files and delegates to extract().

Synchronous

from suparse import SuparseClient
 
with SuparseClient() as client:
    result = client.extract_folder(
        "./receipts/",
        template_id="276a0aa8-84bc-4491-a2e7-1ea13381790c",
        split=True,
        cleanup=True,
    )
    for r in result.succeeded:
        print(r.original_file, r.data)

Asynchronous

import asyncio
from suparse import AsyncSuparseClient
 
async def main():
    async with AsyncSuparseClient() as client:
        result = await client.extract_folder(
            "./receipts/",
            template_id="276a0aa8-84bc-4491-a2e7-1ea13381790c",
            split=True,
            cleanup=True,
        )
        for r in result.succeeded:
            print(r.original_file, r.data)
 
asyncio.run(main())

Note: r.data contains only the extracted document fields. To get all details — including credits_used, template_id, document_id, file_name, page_start, and page_end — use r.documents (a list of DocumentExport objects). To serialize everything, use r.model_dump(mode="json").

Error Handling

All exceptions inherit from SuparseError. Handle them at whatever granularity makes sense:

Synchronous

from suparse import SuparseClient
from suparse.exceptions import (
    SuparseError,
    SuparseAuthError,
    SuparseNetworkError,
    SuparsePollingTimeoutError,
)
 
with SuparseClient(api_key="your_api_key_here") as client:
    try:
        result = client.extract("invoice.pdf")
    except SuparseAuthError:
        print("Invalid API key")
    except SuparseNetworkError:
        print("Connection failed")
    except SuparsePollingTimeoutError:
        print("Processing timed out")
    except SuparseError as e:
        print(f"Unexpected error: {e}")

Asynchronous

import asyncio
from suparse import AsyncSuparseClient
from suparse.exceptions import (
    SuparseError,
    SuparseAuthError,
    SuparseNetworkError,
    SuparsePollingTimeoutError,
)
 
async def main():
    async with AsyncSuparseClient(api_key="your_api_key_here") as client:
        try:
            result = await client.extract("invoice.pdf")
        except SuparseAuthError:
            print("Invalid API key")
        except SuparseNetworkError:
            print("Connection failed")
        except SuparsePollingTimeoutError:
            print("Processing timed out")
        except SuparseError as e:
            print(f"Unexpected error: {e}")
 
asyncio.run(main())

Pass file objects or in-memory streams directly (no Path needed):

    with open("invoice.pdf", "rb") as f:
        result = client.extract(f)

For batch operations, check result.failed to handle per-file errors without catching exceptions:

result = client.extract(["a.pdf", "b.pdf", "c.pdf"])
 
for r in result.succeeded:
    print(f"OK: {r.original_file} -> {r.document_ids}")
 
for f in result.failed:
    print(f"FAIL: {f.file} -> {f.error}")

Low-Level API

For cases where you need file-based output or direct control over the upload/poll/download cycle:

Synchronous

from pathlib import Path
from suparse import SuparseClient
 
with SuparseClient() as client:
    # Upload a file and get back a task ID
    task_id = client.upload_file(
        Path("invoice.pdf"),
        template_id="276a0aa8-84bc-4491-a2e7-1ea13381790c",
        split=False,
        auto_approve=True,  # Set to False to require human review in the Suparse UI
    )
 
    # Poll until processing completes (returns status + document IDs)
    status, doc_ids = client.poll_task_status(task_id)
 
    # Export results by document IDs to a JSON file
    client.download_results(
        ["doc-id-1", "doc-id-2"],
        Path("output.json"),
    )
 
    # Delete documents by ID
    client.delete_documents([
        "550e8400-e29b-41d4-a716-446655440000",
    ])
 
    # List available templates
    templates = client.list_templates()
    for t in templates:
        print(f"{t.name} ({t.template_language})")

Asynchronous

import asyncio
from pathlib import Path
from suparse import AsyncSuparseClient
 
async def main():
    async with AsyncSuparseClient() as client:
        # Process a single document to a JSON file
        success = await client.process_document(
            file_path=Path("invoice.pdf"),
            output_path=Path("results.json"),
            cleanup=True,
        )
 
        # Process multiple files in parallel (returns raw tuples)
        succeeded, failed = await client.process_batch(
            [Path("a.pdf"), Path("b.pdf")],
            template_id="276a0aa8-84bc-4491-a2e7-1ea13381790c",
        )
        # succeeded: list of (Path, task_id, [document_ids])
        # failed: list of (Path, task_id or None, exception)
 
        # Upload a file and get back a task ID
        task_id = await client.upload_file(
            Path("invoice.pdf"),
            template_id="276a0aa8-84bc-4491-a2e7-1ea13381790c",
            split=False,
            auto_approve=True,  # Set to False to require human review in the Suparse UI
        )
 
        # Poll until processing completes (returns status + document IDs)
        status, doc_ids = await client.poll_task_status(task_id)
 
        # Export results by document IDs to a JSON file
        await client.download_results(
            ["doc-id-1", "doc-id-2"],
            Path("output.json"),
        )
 
        # Delete documents by ID
        await client.delete_documents([
            "550e8400-e29b-41d4-a716-446655440000",
        ])
 
        # List available templates
        templates = await client.list_templates()
        for t in templates:
            print(f"{t.name} ({t.template_language})")
 
asyncio.run(main())

Constructor Parameters

Both SuparseClient and AsyncSuparseClient accept the same parameters:

ParameterTypeDefaultDescription
api_urlstrSUPARSE_API_URL env var or https://api.suparse.com/api/v1/API base URL
api_keystrSUPARSE_API_KEY env varYour API key (required)
poll_intervalint5Seconds between polling attempts
max_poll_attemptsint300Max polling attempts before timeout

Result Objects

ObjectPropertiesDescription
TaskExporttask_id, original_file, total_documents_extracted, documents, data, document_idsSuccessfully extracted file; documents contains DocumentExport objects with credits_used, template_id, etc.
FailedResultfile, errorFile that failed during extraction
BatchResultsucceeded, failed, totalContainer for batch results; access .succeeded and .failed explicitly

Exceptions

All exceptions inherit from SuparseError.

ExceptionRaised When
SuparseErrorBase exception for all SDK errors
SuparseNetworkErrorNetwork connection fails or times out
SuparsePollingTimeoutErrorPolling exceeds max_poll_attempts
SuparseProcessingErrorDocument fails to process on the server
SuparseAPIErrorBase for HTTP error responses (has status_code, response_body)
SuparseAuthError401/403 authentication or authorization error
SuparseNotFoundError404 resource not found
SuparseRateLimitError429 too many requests
SuparseServerError5xx server error
SuparseSDKErrorSDK fails to parse the API response into the expected data model

extract() Parameters

ParameterTypeDefaultDescription
filesstr, Path, IO[bytes], or iterable of theserequiredOne or more files to process
template_idstrNoneTemplate ID (auto-detect if omitted)
splitboolFalseAuto-split multi-page documents
auto_approveboolTrueSet to False to require human review in the Suparse UI
cleanupboolFalseDelete documents from server after extraction
on_progresscallableNoneCalled with each TaskExport or FailedResult as it completes

extract_folder() Parameters

ParameterTypeDefaultDescription
folderstr or PathrequiredDirectory to scan
patternstr"*"Glob pattern for file discovery (filtered by supported extensions)
template_idstrNoneTemplate ID (auto-detect if omitted)
splitboolFalseAuto-split multi-page documents
auto_approveboolTrueSet to False to require human review in the Suparse UI
cleanupboolFalseDelete documents from server after extraction
on_progresscallableNoneCalled with each result as it completes