Python SDK & CLI
Last updated: 2026-05-15
Suparse Python SDK
Official Python SDK and CLI for the Suparse Document Processing API.
Suparse is an AI-powered document processing API for extracting structured data from any document type, including invoices, receipts, bank statements, purchase orders and many more. This SDK wraps the REST API with a Python client and CLI.
Requirements
- Python 3.10+
- Dependencies:
httpx,pydantic,pydantic-settings,tenacity
Installation
pip install suparseAuthentication
You'll need an API key to use the SDK or CLI. To obtain one:
- Sign in at suparse.com
- Go to the API Keys tab
- Enter a key name and click Generate New Key
- Copy the key value — it will be shown only once
Set it as an environment variable:
export SUPARSE_API_KEY="your_api_key_here"Or pass it directly to the SDK constructor (see below).
Quick Start
# CLI
suparse process invoice.pdf -o results.json
# Python — synchronous (with SUPARSE_API_KEY env var set)
python3 -c "
from suparse import SuparseClient
with SuparseClient() as client:
result = client.extract('invoice.pdf')
for r in result.succeeded:
print(r.original_file, r.data)
"
# Python — asynchronous
python3 -c "
import asyncio
from suparse import AsyncSuparseClient
async def main():
async with AsyncSuparseClient() as client:
result = await client.extract('invoice.pdf')
for r in result.succeeded:
print(r.original_file, r.data)
asyncio.run(main())
"CLI Usage
Run suparse --help or suparse process --help for full usage information.
Set your API key and process a file directly from the terminal. The SDK will auto-upload, poll, and download the resulting JSON.
Process a Document
export SUPARSE_API_KEY="your_api_key_here"
# Auto-detect template
suparse process path/to/invoice.pdf -o results.json
# Use specific template without auto-splitting
suparse process path/to/invoice.pdf --template-id 276a0aa8-84bc-4491-a2e7-1ea13381790c
# Auto-split a multi-page PDF containing mixed document types (e.g. receipts + bank statements)
suparse process path/to/merged.pdf --with-split
# Use specific template with auto-splitting
suparse process path/to/invoice.pdf --template-id 276a0aa8-84bc-4491-a2e7-1ea13381790c --with-split
# Process and auto-delete documents from server after download
suparse process path/to/invoice.pdf --cleanupProcess a Folder
Process all supported files (.pdf, .jpg, .jpeg, .png, .heic, .heif) in a folder. All files are uploaded and polled individually, then results are exported to a single JSON file.
# Process all supported files in a folder
suparse process --folder path/to/receipts/
# Output to a specific file (default: {folder_name}_results.json)
suparse process --folder path/to/receipts/ -o all_results.json
# Process a folder with a specific template
suparse process --folder path/to/receipts/ --template-id 276a0aa8-84bc-4491-a2e7-1ea13381790c
# Process a folder with auto-splitting enabled
suparse process --folder path/to/receipts/ --with-split
# Process a folder and auto-delete documents from server after download
suparse process --folder path/to/receipts/ --cleanupDelete Documents
# Delete one or more documents by ID (prompts for confirmation)
suparse delete <document_id>
suparse delete <id1> <id2> <id3>
# Skip confirmation prompt
suparse delete <id1> <id2> -yDeleting a parent document automatically deletes all its child documents (server-side cascade).
List Available Templates
Templates define how a document type (invoice, receipt, bank statement, etc.) is parsed. Before processing a document, check which templates are already assigned to your account:
# List templates assigned to your account (table format)
suparse templates
# List templates in JSON format
suparse templates --format json
# Include all system templates (not yet assigned to your account)
suparse templates --include-systemThe recommended way to process documents is with auto-split enabled (--with-split), which handles both single-type and mixed document types automatically:
suparse process path/to/documents.pdf --with-split -o results.jsonIf you consistently process one document type, look up the template ID and pass it directly:
- Run
suparse templatesto see templates assigned to your account. - Use the matching template ID:
suparse process invoice.pdf --template-id <id> -o results.json - If no template matches, run
suparse templates --include-systemto browse all system templates. Assign one to your account via the Suparse UI. - If no system template fits, create a custom template using the template creator in the Suparse UI.
Configuration
The CLI reads settings from environment variables or a .env file in the working directory:
| Variable | Default | Description |
|---|---|---|
SUPARSE_API_URL | https://api.suparse.com/api/v1/ | API base URL |
SUPARSE_API_KEY | — | Your API key (required) |
POLL_INTERVAL | 5 | Seconds between polling attempts |
MAX_POLL_ATTEMPTS | 300 | Max polling attempts before timeout |
LOG_LEVEL | INFO | Logging level |
Priority: CLI flags > environment variables > .env file > defaults.
Global Options
These options apply to all subcommands.
| Option | Description |
|---|---|
--api-url | API URL (default: from SUPARSE_API_URL env var) |
--api-key | API Key (default: from SUPARSE_API_KEY env var) |
-v, --verbose | Enable verbose (DEBUG) output |
CLI Options
| Command | Option | Description |
|---|---|---|
process | --folder | Process all supported files in a folder (mutually exclusive with file path) |
process | -o, --output | Output JSON file path (default: {file_stem}_results.json, folder mode: {folder_name}_results.json) |
process | --template-id | Template ID to use (default: auto-detect) |
process | --with-split | Auto-split multi-page PDFs containing mixed document types (e.g. receipts + bank statements) so each is processed with the correct template (default: off) |
process | --cleanup | Delete documents from server after download so files are stored only during processing and cannot be accessed later (default: off) |
templates | --format | Output format: table or json (default: table) |
templates | --include-system | Include system templates not yet assigned to your account (default: off) |
delete | -y, --yes | Skip confirmation prompt |
Python SDK Usage
The SDK provides both synchronous and asynchronous clients. Both handle API rate limits, parallel processing, and connection pooling automatically.
Synchronous (Recommended for Scripts, Pandas, Jupyter Notebooks)
from suparse import SuparseClient
with SuparseClient() as client:
result = client.extract(["invoice1.pdf", "invoice2.pdf"])
for r in result.succeeded:
print(r.original_file, r.data)Asynchronous (Recommended for FastAPI, Aiohttp, Async Pipelines)
import asyncio
from suparse import AsyncSuparseClient
async def main():
async with AsyncSuparseClient() as client:
result = await client.extract("invoice.pdf")
for r in result.succeeded:
print(r.original_file, r.data)
asyncio.run(main())The client reads SUPARSE_API_KEY and SUPARSE_API_URL from environment variables by default, so you can simply use SuparseClient() or AsyncSuparseClient() if those are set.
Extract One or More Documents
extract() is the primary API. It accepts a single file, a list of files, or any iterable (like Path.glob). Each input can be a string path, a Path object, or an open file handle.
Synchronous
from pathlib import Path
from suparse import SuparseClient
with SuparseClient() as client:
# Single file (string or Path)
result = client.extract("invoice.pdf")
for r in result.succeeded:
print(r.original_file, r.data)
# List of files from different locations
result = client.extract([
"./receipts/jan.pdf",
Path("./receipts/feb.pdf"),
])
# Glob generator
result = client.extract(
Path("./receipts/").glob("*.pdf")
)
# With progress callback
def print_progress(r):
if isinstance(r, FailedResult):
print(f" Failed: {r.file} - {r.error}")
else:
print(f" Done: {r.original_file}")
result = client.extract(
files=Path("./receipts/").glob("*.pdf"),
on_progress=print_progress,
)
# Access results
for r in result.succeeded:
print(r.original_file, r.data, r.document_ids)
for f in result.failed:
print(f.file, f.error)
print(f"Total: {result.total}")Asynchronous
import asyncio
from pathlib import Path
from suparse import AsyncSuparseClient
async def main():
async with AsyncSuparseClient() as client:
# Single file (string or Path)
result = await client.extract("invoice.pdf")
for r in result.succeeded:
print(r.original_file, r.data)
# List of files from different locations
result = await client.extract([
"./receipts/jan.pdf",
Path("./receipts/feb.pdf"),
])
# Glob generator
result = await client.extract(
Path("./receipts/").glob("*.pdf")
)
# With progress callback
def print_progress(r):
if isinstance(r, FailedResult):
print(f" Failed: {r.file} - {r.error}")
else:
print(f" Done: {r.original_file}")
result = await client.extract(
files=Path("./receipts/").glob("*.pdf"),
on_progress=print_progress,
)
# Access results
for r in result.succeeded:
print(r.original_file, r.data, r.document_ids)
for f in result.failed:
print(f.file, f.error)
print(f"Total: {result.total}")Extract a Folder
extract_folder() is a convenience wrapper that discovers supported files and delegates to extract().
Synchronous
from suparse import SuparseClient
with SuparseClient() as client:
result = client.extract_folder(
"./receipts/",
template_id="276a0aa8-84bc-4491-a2e7-1ea13381790c",
split=True,
cleanup=True,
)
for r in result.succeeded:
print(r.original_file, r.data)Asynchronous
import asyncio
from suparse import AsyncSuparseClient
async def main():
async with AsyncSuparseClient() as client:
result = await client.extract_folder(
"./receipts/",
template_id="276a0aa8-84bc-4491-a2e7-1ea13381790c",
split=True,
cleanup=True,
)
for r in result.succeeded:
print(r.original_file, r.data)
asyncio.run(main())Note:
r.datacontains only the extracted document fields. To get all details — includingcredits_used,template_id,document_id,file_name,page_start, andpage_end— user.documents(a list ofDocumentExportobjects). To serialize everything, user.model_dump(mode="json").
Error Handling
All exceptions inherit from SuparseError. Handle them at whatever granularity makes sense:
Synchronous
from suparse import SuparseClient
from suparse.exceptions import (
SuparseError,
SuparseAuthError,
SuparseNetworkError,
SuparsePollingTimeoutError,
)
with SuparseClient(api_key="your_api_key_here") as client:
try:
result = client.extract("invoice.pdf")
except SuparseAuthError:
print("Invalid API key")
except SuparseNetworkError:
print("Connection failed")
except SuparsePollingTimeoutError:
print("Processing timed out")
except SuparseError as e:
print(f"Unexpected error: {e}")Asynchronous
import asyncio
from suparse import AsyncSuparseClient
from suparse.exceptions import (
SuparseError,
SuparseAuthError,
SuparseNetworkError,
SuparsePollingTimeoutError,
)
async def main():
async with AsyncSuparseClient(api_key="your_api_key_here") as client:
try:
result = await client.extract("invoice.pdf")
except SuparseAuthError:
print("Invalid API key")
except SuparseNetworkError:
print("Connection failed")
except SuparsePollingTimeoutError:
print("Processing timed out")
except SuparseError as e:
print(f"Unexpected error: {e}")
asyncio.run(main())Pass file objects or in-memory streams directly (no Path needed):
with open("invoice.pdf", "rb") as f:
result = client.extract(f)For batch operations, check result.failed to handle per-file errors without catching exceptions:
result = client.extract(["a.pdf", "b.pdf", "c.pdf"])
for r in result.succeeded:
print(f"OK: {r.original_file} -> {r.document_ids}")
for f in result.failed:
print(f"FAIL: {f.file} -> {f.error}")Low-Level API
For cases where you need file-based output or direct control over the upload/poll/download cycle:
Synchronous
from pathlib import Path
from suparse import SuparseClient
with SuparseClient() as client:
# Upload a file and get back a task ID
task_id = client.upload_file(
Path("invoice.pdf"),
template_id="276a0aa8-84bc-4491-a2e7-1ea13381790c",
split=False,
auto_approve=True, # Set to False to require human review in the Suparse UI
)
# Poll until processing completes (returns status + document IDs)
status, doc_ids = client.poll_task_status(task_id)
# Export results by document IDs to a JSON file
client.download_results(
["doc-id-1", "doc-id-2"],
Path("output.json"),
)
# Delete documents by ID
client.delete_documents([
"550e8400-e29b-41d4-a716-446655440000",
])
# List available templates
templates = client.list_templates()
for t in templates:
print(f"{t.name} ({t.template_language})")Asynchronous
import asyncio
from pathlib import Path
from suparse import AsyncSuparseClient
async def main():
async with AsyncSuparseClient() as client:
# Process a single document to a JSON file
success = await client.process_document(
file_path=Path("invoice.pdf"),
output_path=Path("results.json"),
cleanup=True,
)
# Process multiple files in parallel (returns raw tuples)
succeeded, failed = await client.process_batch(
[Path("a.pdf"), Path("b.pdf")],
template_id="276a0aa8-84bc-4491-a2e7-1ea13381790c",
)
# succeeded: list of (Path, task_id, [document_ids])
# failed: list of (Path, task_id or None, exception)
# Upload a file and get back a task ID
task_id = await client.upload_file(
Path("invoice.pdf"),
template_id="276a0aa8-84bc-4491-a2e7-1ea13381790c",
split=False,
auto_approve=True, # Set to False to require human review in the Suparse UI
)
# Poll until processing completes (returns status + document IDs)
status, doc_ids = await client.poll_task_status(task_id)
# Export results by document IDs to a JSON file
await client.download_results(
["doc-id-1", "doc-id-2"],
Path("output.json"),
)
# Delete documents by ID
await client.delete_documents([
"550e8400-e29b-41d4-a716-446655440000",
])
# List available templates
templates = await client.list_templates()
for t in templates:
print(f"{t.name} ({t.template_language})")
asyncio.run(main())Constructor Parameters
Both SuparseClient and AsyncSuparseClient accept the same parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
api_url | str | SUPARSE_API_URL env var or https://api.suparse.com/api/v1/ | API base URL |
api_key | str | SUPARSE_API_KEY env var | Your API key (required) |
poll_interval | int | 5 | Seconds between polling attempts |
max_poll_attempts | int | 300 | Max polling attempts before timeout |
Result Objects
| Object | Properties | Description |
|---|---|---|
TaskExport | task_id, original_file, total_documents_extracted, documents, data, document_ids | Successfully extracted file; documents contains DocumentExport objects with credits_used, template_id, etc. |
FailedResult | file, error | File that failed during extraction |
BatchResult | succeeded, failed, total | Container for batch results; access .succeeded and .failed explicitly |
Exceptions
All exceptions inherit from SuparseError.
| Exception | Raised When |
|---|---|
SuparseError | Base exception for all SDK errors |
SuparseNetworkError | Network connection fails or times out |
SuparsePollingTimeoutError | Polling exceeds max_poll_attempts |
SuparseProcessingError | Document fails to process on the server |
SuparseAPIError | Base for HTTP error responses (has status_code, response_body) |
SuparseAuthError | 401/403 authentication or authorization error |
SuparseNotFoundError | 404 resource not found |
SuparseRateLimitError | 429 too many requests |
SuparseServerError | 5xx server error |
SuparseSDKError | SDK fails to parse the API response into the expected data model |
extract() Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
files | str, Path, IO[bytes], or iterable of these | required | One or more files to process |
template_id | str | None | Template ID (auto-detect if omitted) |
split | bool | False | Auto-split multi-page documents |
auto_approve | bool | True | Set to False to require human review in the Suparse UI |
cleanup | bool | False | Delete documents from server after extraction |
on_progress | callable | None | Called with each TaskExport or FailedResult as it completes |
extract_folder() Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
folder | str or Path | required | Directory to scan |
pattern | str | "*" | Glob pattern for file discovery (filtered by supported extensions) |
template_id | str | None | Template ID (auto-detect if omitted) |
split | bool | False | Auto-split multi-page documents |
auto_approve | bool | True | Set to False to require human review in the Suparse UI |
cleanup | bool | False | Delete documents from server after extraction |
on_progress | callable | None | Called with each result as it completes |