Base URL: http://localhost:5000/api/v1
Interactive docs: Not enabled by default in the Flask build. Use the endpoints
below with a tool like curl or Postman.
Returns API status.
Response 200
{ "status": "healthy", "version": "1.0.0" }Upload a PDF file.
Request – multipart/form-data
| Field | Type | Description |
|---|---|---|
file |
File | PDF file (max 50 MB) |
Response 200
{
"document_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"filename": "invoice.pdf",
"status": "uploaded",
"message": "PDF uploaded successfully. Use /extract to process the document."
}Errors
| Code | Reason |
|---|---|
| 400 | File is not a PDF |
| 413 | File exceeds size limit |
Extract text, tables, and structured fields from an uploaded PDF.
Path params
| Param | Type | Description |
|---|---|---|
document_id |
UUID | ID returned by /upload |
Response 200
{
"document_id": "3fa85f64-...",
"filename": "invoice.pdf",
"total_pages": 2,
"fields": [
{
"field_name": "date",
"value": "01/15/2025",
"confidence": 0.92,
"page_number": 1,
"bounding_box": null
}
],
"extracted_text": "Invoice #INV-1234\nDate: 01/15/2025 ...",
"tables": [
[["Item", "Qty", "Price"], ["Widget", "2", "$9.99"]]
],
"extraction_time_seconds": 0.843
}Errors
| Code | Reason |
|---|---|
| 404 | Document not found |
Update extracted fields for a document.
Request body
{
"document_id": "3fa85f64-...",
"fields": [
{
"field_name": "amount",
"value": "19.99",
"confidence": 0.95,
"page_number": 1
}
]
}Response 200
{
"document_id": "3fa85f64-...",
"status": "updated",
"updated_fields": 1
}Export a document with updated data.
Request body
{
"document_id": "3fa85f64-...",
"format": "pdf",
"include_annotations": false
}format must be one of pdf, json, csv.
Response 200
{
"document_id": "3fa85f64-...",
"download_url": "/api/v1/download/3fa85f64-...?format=pdf",
"format": "pdf",
"expires_at": "2025-02-02T00:00:00Z"
}Download an exported file.
Query params
| Param | Default | Options |
|---|---|---|
format |
pdf |
pdf, json, csv |
Response – binary file stream with appropriate Content-Type.
List all documents (paginated).
Query params
| Param | Default |
|---|---|
page |
1 |
page_size |
20 |
Response 200
{
"documents": [ { "document_id": "...", "filename": "...", ... } ],
"total": 42,
"page": 1,
"page_size": 20
}Get details for a specific document.
Response 200 – raw document dict including extracted fields and text.
Delete a document and remove the file from disk.
Response 200
{ "status": "deleted", "document_id": "3fa85f64-..." }