Uploading files

Most users don't need to implement file upload themselves:

  • In the browser, drag-and-drop files into a stream from the Marple DB UI.

  • In Python, use stream.push_file(...) from the Python SDK.

  • In the CLI, use mdb ingest ... from the Rust SDK

Implement the REST flow directly only if you're integrating from a language without a Marple SDK. The rest of this section walks through it.

Upload And Ingestion REST API

This document describes the REST contract for uploading a file into a Marple DB stream and starting ingestion. It is intended for SDK implementations.

All API endpoints below are relative to the Marple DB API base URL, for example /api/v1. API requests require normal Marple DB authentication. Direct storage uploads use signed URLs or SAS URLs and must be sent to storage directly, not through the Marple DB API client.

Flow Overview

  1. Create an ingestion with POST /ingestion.

  2. Upload the file using the returned mode.

  3. Mark the upload complete with POST /ingestion/{ingestion_id}/upload/complete.

  4. If upload cannot be completed, call POST /ingestion/{ingestion_id}/abort.

The server decides the upload mode. SDKs must branch on the returned mode and should not hard-code their own upload threshold.

Create Ingestion

POST /ingestion

Request body:

Fields:

  • stream_id is the numeric stream identifier.

  • dataset_name is the dataset name/path to create in the stream.

  • file_size is the exact file size in bytes. It must be greater than 0 and no larger than 256 GiB.

  • metadata is optional dataset metadata. If omitted or null, it is treated as an empty object.

Successful response:

Response fields:

  • dataset_id identifies the dataset created for this upload.

  • ingestion_id identifies the ingestion record and is used by all later ingestion endpoints.

  • mode is one of server, azure, single, or multipart.

  • presigned_url is present for azure and single uploads.

  • part_size is present for multipart uploads.

  • expires_in is the lifetime, in seconds, of storage upload URLs returned in this response.

Creating the ingestion creates the dataset and places the ingestion in the UPLOADING state.

Upload Modes

server

Use this mode when the server returns "mode": "server", or as a fallback if a direct storage upload fails and the SDK wants to retry through the API server.

POST /ingestion/{ingestion_id}/upload/server

Request body:

  • multipart/form-data

  • field name: file

  • field value: the full file body

Successful response:

After the server upload succeeds, call the complete endpoint.

azure

Use this mode when the server returns "mode": "azure".

Upload the full file to presigned_url using Azure Blob Storage semantics. Browser clients can use BlockBlobClient(presigned_url).uploadData(file). Other SDKs should perform an equivalent upload to the SAS URL.

After the Azure upload succeeds, call the complete endpoint.

single

Use this mode when the server returns "mode": "single".

Upload the full file with one PUT request to presigned_url.

The PUT body must be the raw file bytes. Do not send Marple DB API authentication headers to this storage URL.

After the storage upload succeeds, call the complete endpoint.

multipart

Use this mode when the server returns "mode": "multipart".

The initial response includes part_size. Split the file into 1-based parts using that size:

  • start = (part_number - 1) * part_size

  • end = min(start + part_size, file_size)

  • the final part may be smaller than part_size

Fetch signed URLs for parts with:

GET /ingestion/{ingestion_id}/upload/part-urls?start_part=1&count=32

Query parameters:

  • start_part is required and must be between 1 and the total number of parts.

  • count is optional. If omitted, the server returns up to 32 parts. It must be greater than 0.

Successful response:

Response fields:

  • parts contains signed S3 upload URLs for the requested range.

  • expires_in is the lifetime, in seconds, of the returned part URLs.

  • next_part is the next part number to request, or null when all parts have been signed.

Upload each part with PUT to its signed url. The request body must be exactly that part's raw bytes. Do not send Marple DB API authentication headers to these storage URLs.

SDKs may upload parts concurrently. Use bounded concurrency so large files do not create unbounded requests; the frontend currently uses 4 workers.

After all parts upload successfully, call the complete endpoint. The Marple DB server completes the S3 multipart upload using the uploaded storage parts, verifies the final object size, and starts ingestion.

Complete Upload

POST /ingestion/{ingestion_id}/upload/complete

Request body: empty.

Successful response:

Completion verifies the uploaded object size against the file_size given to POST /ingestion. If the size differs, the ingestion is marked FAILED and the endpoint returns an error.

On success, ingestion starts. The dataset then moves out of UPLOADING into the normal ingestion lifecycle.

Abort Upload

POST /ingestion/{ingestion_id}/abort

Request body:

The body is optional. If no reason is provided, the server records "Unknown reason".

Successful response:

Call abort when the SDK cannot finish an upload after creating an ingestion. Abort marks unstable ingestion states as FAILED and cancels queued ingestion work when applicable.

SDK Error Handling Requirements

SDKs should follow these rules:

  • If POST /ingestion fails, no ingestion was created and there is nothing to abort.

  • If any upload step fails after an ingestion_id was returned, either retry safely or call abort.

  • If a direct storage upload fails, SDKs may retry via POST /ingestion/{ingestion_id}/upload/server using the same ingestion, then call complete if the server upload succeeds.

  • If POST /ingestion/{ingestion_id}/upload/complete fails, surface the error to the caller. If the failure is permanent, call abort unless the server already marked the ingestion failed.

  • Signed URLs expire after expires_in seconds. Request fresh multipart part URLs when retrying a part after expiry.

Last updated