Document Miner — Extract Structured Data from Business Documents

Quick Start

Get your first document extracted in under 5 minutes. You'll need an API token — request access if you don't have one yet.

Obtain an API token from us after signing up
Base64-encode your document file
Send a POST request to /api/parse-documents-sync with a JSON body containing FileName and Content
Parse the JSON array response — each element is a typed document object

Shell

curl -X POST https://documentminer.eu.jetveo.io/api/parse-documents-sync \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "FileName": "invoice.pdf",
    "Content": "'$(base64 -w0 invoice.pdf)'"
  }'

Authentication

All API requests must include a Bearer token in the Authorization header. Tokens are issued when you sign up for API access and can be scoped per integration.

HTTP Header

Authorization: Bearer YOUR_API_TOKEN

Token management: API tokens are managed through the admin interface. For on-premises deployments, tokens are managed through your own admin panel. Contact us at info@alfaveo.com to request a token or manage your account.

POST/api/parse-documents-sync

Extracts structured data from a business document. Accepts a JSON body with the document file encoded as base64. Returns an array of typed document objects.

Request Body (application/json)

Field	Type	Description
`FileName`required	`string`	Original filename including extension (e.g. invoice.pdf). Used to determine file type.
`Content`required	`string`	Base64-encoded file content. Accepted formats: PDF, JPEG, PNG, TIFF, WebP, ISDOC, ISDOCX, ZUGFeRD XML.

Other endpoints

Endpoint	Description
`POST /api/parse-communication-sync`	Extract structured data from email or communication documents
`POST /api/generate-isdoc`	Generate an ISDOC e-invoice from structured data
`POST /api/generate-zugferd`	Generate a ZUGFeRD / Factur-X e-invoice
`POST /api/generate-embed-isdoc`	Embed ISDOC XML into a PDF file
`POST /api/generate-embed-zugferd`	Embed ZUGFeRD XML into a PDF file
`POST /api/assign-ledger-account`	Assign ledger accounts to extracted line items
`POST /api/ping`	Health check — returns service status

Full spec: Download the OpenAPI specification below to explore all endpoints, request/response schemas, and error codes interactively.

Response Schema

A successful extraction returns HTTP 200 with a JSON array. Each element is a typed document object identified by a $type discriminator field. The structure of each object depends on the detected document type.

JSON — Invoice response (200 OK)

[
  {
    "$type": "Invoice",
    "InvoiceNumber": "2024-0042",
    "IssueDate": "2024-01-15",
    "DueDate": "2024-02-15",
    "Vendor": {
      "Name": "ACME s.r.o.",
      "RegistrationNumber": "12345678",
      "VatId": "CZ12345678",
      "Address": "Wenceslas Square 1, Prague"
    },
    "Buyer": {
      "Name": "Buyer Corp a.s.",
      "VatId": "CZ87654321"
    },
    "LineItems": [
      {
        "Description": "Software License Q1",
        "Quantity": 1,
        "UnitPrice": 45000,
        "VatRate": 21,
        "Total": 54450
      }
    ],
    "Totals": {
      "Subtotal": 45000,
      "Vat": 9450,
      "Total": 54450,
      "Currency": "CZK"
    }
  }
]

Document types

$type value	Description
`Invoice`	Tax invoice or proforma invoice
`Quote`	Offer or quotation
`PurchaseOrder`	Purchase order from buyer
`Inquiry`	Request for quotation or inquiry
`DeliveryNote`	Delivery or shipping note
`Contract`	Contract or agreement

Code Examples

Complete, working examples for the most common integration languages. Replace YOUR_API_TOKEN with your actual token.

Shell

curl -X POST https://documentminer.eu.jetveo.io/api/parse-documents-sync \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "FileName": "invoice.pdf",
    "Content": "'$(base64 -w0 invoice.pdf)'"
  }'

Supported File Formats

ISDOC and ZUGFeRD are processed by dedicated native parsers, yielding 100% accuracy. All other formats use Documiner AI extraction.

Format	Extensions	Processing	Notes
PDF	`.pdf`	AI	All PDF versions, including scanned (image-based)
JPEG	`.jpg, .jpeg`	AI	Photos of documents
PNG	`.png`	AI	Screenshots, scans
TIFF	`.tif, .tiff`	AI	High-resolution scans
WebP	`.webp`	AI
ISDOC	`.isdoc, .isdocx`	Native parser	Czech e-invoice standard. 100% accuracy.
ZUGFeRD / Factur-X	`.xml, .pdf`	Native parser	EU e-invoice standard (embedded XML). 100% accuracy.

Maximum file size: 20 MB. For multi-page PDFs, all pages are processed and results are aggregated into a single response. Each API call counts as one document regardless of page count.

Error Codes

Error responses include a JSON body with error, message, and request_id fields. Use request_id when contacting support.

HTTP Status	Error	Resolution
`400`	Bad Request	Check that `file` is included and non-empty.
`401`	Unauthorized	Verify your API token is valid and included in the header.
`402`	Payment Required	Account balance depleted. Top up credits to continue.
`415`	Unsupported Media Type	File format not supported. See supported formats above.
`422`	Unprocessable Entity	File is valid but document could not be parsed. Check the file is readable and not corrupted.
`429`	Too Many Requests	Rate limit exceeded. Check the `Retry-After` response header. Default limit: 100 req/min.
`500`	Internal Server Error	Transient error. Retry with exponential backoff. If persistent, contact support.

Error response body

JSON

{
  "error": "unauthorized",
  "message": "API token is missing or invalid.",
  "request_id": "req_01HXZ3F8QK5Y2VWTN8B9GHJM4"
}

On-Premises Deployment

Documiner can be deployed inside your own infrastructure as a Docker container or a standalone binary. In both cases, you provide your own AI API key and data never leaves your environment. On-premises pricing is individual — contact us for details.

Docker

Shell

# Run with your AI API key
# (contact us for the Docker image)
docker run -d \
  -p 8080:8080 \
  -e AI_API_KEY=your_ai_key \
  -e API_SECRET=your_admin_secret \
  documiner:latest

# Extract a document
curl -X POST http://localhost:8080/api/parse-documents-sync \
  -H "Authorization: Bearer your_token" \
  -H "Content-Type: application/json" \
  -d '{"FileName":"invoice.pdf","Content":"<base64>"}'

Environment variables

Variable	Required	Description
`AI_API_KEY`	Yes	Your AI provider API key
`AI_MODEL`	No	AI model override
`API_SECRET`	Yes	Admin secret for managing tokens and users
`PORT`	No	HTTP port. Default: `8080`
`LOG_LEVEL`	No	Logging verbosity: `debug`, `info`, `warn`, `error`. Default: `info`

Standalone binary: A pre-compiled binary is available for Linux (amd64, arm64) and Windows (amd64). Contact us for download access and deployment documentation.

OpenAPI Specification

The full API is described in an OpenAPI 3.1 specification. Import it into Postman, Insomnia, or any OpenAPI-compatible tool to explore and test the API interactively.

Download openapi.json View raw spec

To view the spec interactively, paste the URL into editor.swagger.io.

Ready to integrate?

Request your API token and start extracting documents in minutes.

Request API Access Back to Overview

Build with Documiner

Quick Start

Authentication

POST/api/parse-documents-sync

Request Body (application/json)

Other endpoints

Response Schema

Document types

Code Examples

Supported File Formats

Error Codes

Error response body

On-Premises Deployment

Docker

Environment variables

OpenAPI Specification

Ready to integrate?