Skip to content

Document Extraction

The Read File component can delegate file processing to the KARLI document-extraction service, enabling structured extraction from documents and audio using specialized models.

Extraction Backend

The component's Extraction Backend dropdown selects how files are processed:

Backend Behavior
docling (default) Files are processed locally using the upstream Langflow / Docling pipeline.
karli Files are sent to the KARLI extraction service, which returns structured content.

Model Selection

When Extraction Backend is set to karli, an additional Model field appears. The dropdown is populated dynamically from the KARLI provider — KARLI is currently the only provider that offers data-extraction models.

See Models → Data Extraction for the full list of available models and the file types each one accepts. The component validates the uploaded file against the chosen model's accepted type before uploading (for example, submitting a PDF to the Whisper model produces an error rather than an upload).

Authentication

The KARLI backend is called with the user's session JWT (injected by the KARLI proxy) as a Bearer token.

How It Works

When extraction is enabled, the component issues a POST to {KARLI_BASE_URL}/data-extraction/extract as a multipart upload. The form field extractorModel carries the selected model (mapped to its KARLI identifier), and the file part carries the document. The service returns a JSON response whose segments are concatenated into a single text payload — segments with a title are emitted as ## <title> Markdown headers.

flowchart LR
    Upload["File Upload"] --> RF["Read File"]
    RF --> KES["KARLI Extraction Service"]
    KES --> Out["Structured Output"]

The downstream output is a Data carrying:

Field Description
file_path Original local path of the uploaded file.
exported_content The assembled text (with ## <title> headers per segment).
export_format Always KARLI for this backend.
task_id The KARLI extraction task identifier — useful for tracing.
model The model identifier that was used.

Configuration Summary

Setting Description
Extraction Backend docling (local) or karli (KARLI service).
Model Visible only when the karli backend is selected.

read-file-karli-model-selection.png