Uploading files
Most users don't need to implement file upload themselves:
In the browser, drag-and-drop files into a stream from the Marple DB UI.
In Python, use
stream.push_file(...)from the Python SDK.In the CLI, use
mdb ingest ...from the Rust SDK
Implement the REST flow directly only if you're integrating from a language without a Marple SDK. The rest of this section walks through it.
Upload And Ingestion REST API
This document describes the REST contract for uploading a file into a Marple DB stream and starting ingestion. It is intended for SDK implementations.
All API endpoints below are relative to the Marple DB API base URL, for example /api/v1. API requests require normal Marple DB authentication. Direct storage uploads use signed URLs or SAS URLs and must be sent to storage directly, not through the Marple DB API client.
Flow Overview
Create an ingestion with
POST /ingestion.Upload the file using the returned
mode.Mark the upload complete with
POST /ingestion/{ingestion_id}/upload/complete.If upload cannot be completed, call
POST /ingestion/{ingestion_id}/abort.
The server decides the upload mode. SDKs must branch on the returned mode and should not hard-code their own upload threshold.
Create Ingestion
POST /ingestion
Request body:
Fields:
stream_idis the numeric stream identifier.dataset_nameis the dataset name/path to create in the stream.file_sizeis the exact file size in bytes. It must be greater than0and no larger than256 GiB.metadatais optional dataset metadata. If omitted or null, it is treated as an empty object.
Successful response:
Response fields:
dataset_ididentifies the dataset created for this upload.ingestion_ididentifies the ingestion record and is used by all later ingestion endpoints.modeis one ofserver,azure,single, ormultipart.presigned_urlis present forazureandsingleuploads.part_sizeis present formultipartuploads.expires_inis the lifetime, in seconds, of storage upload URLs returned in this response.
Creating the ingestion creates the dataset and places the ingestion in the UPLOADING state.
Upload Modes
server
Use this mode when the server returns "mode": "server", or as a fallback if a direct storage upload fails and the SDK wants to retry through the API server.
POST /ingestion/{ingestion_id}/upload/server
Request body:
multipart/form-datafield name:
filefield value: the full file body
Successful response:
After the server upload succeeds, call the complete endpoint.
azure
Use this mode when the server returns "mode": "azure".
Upload the full file to presigned_url using Azure Blob Storage semantics. Browser clients can use BlockBlobClient(presigned_url).uploadData(file). Other SDKs should perform an equivalent upload to the SAS URL.
After the Azure upload succeeds, call the complete endpoint.
single
Use this mode when the server returns "mode": "single".
Upload the full file with one PUT request to presigned_url.
The PUT body must be the raw file bytes. Do not send Marple DB API authentication headers to this storage URL.
After the storage upload succeeds, call the complete endpoint.
multipart
Use this mode when the server returns "mode": "multipart".
The initial response includes part_size. Split the file into 1-based parts using that size:
start = (part_number - 1) * part_sizeend = min(start + part_size, file_size)the final part may be smaller than
part_size
Fetch signed URLs for parts with:
GET /ingestion/{ingestion_id}/upload/part-urls?start_part=1&count=32
Query parameters:
start_partis required and must be between1and the total number of parts.countis optional. If omitted, the server returns up to32parts. It must be greater than0.
Successful response:
Response fields:
partscontains signed S3 upload URLs for the requested range.expires_inis the lifetime, in seconds, of the returned part URLs.next_partis the next part number to request, ornullwhen all parts have been signed.
Upload each part with PUT to its signed url. The request body must be exactly that part's raw bytes. Do not send Marple DB API authentication headers to these storage URLs.
SDKs may upload parts concurrently. Use bounded concurrency so large files do not create unbounded requests; the frontend currently uses 4 workers.
After all parts upload successfully, call the complete endpoint. The Marple DB server completes the S3 multipart upload using the uploaded storage parts, verifies the final object size, and starts ingestion.
Complete Upload
POST /ingestion/{ingestion_id}/upload/complete
Request body: empty.
Successful response:
Completion verifies the uploaded object size against the file_size given to POST /ingestion. If the size differs, the ingestion is marked FAILED and the endpoint returns an error.
On success, ingestion starts. The dataset then moves out of UPLOADING into the normal ingestion lifecycle.
Abort Upload
POST /ingestion/{ingestion_id}/abort
Request body:
The body is optional. If no reason is provided, the server records "Unknown reason".
Successful response:
Call abort when the SDK cannot finish an upload after creating an ingestion. Abort marks unstable ingestion states as FAILED and cancels queued ingestion work when applicable.
SDK Error Handling Requirements
SDKs should follow these rules:
If
POST /ingestionfails, no ingestion was created and there is nothing to abort.If any upload step fails after an
ingestion_idwas returned, either retry safely or call abort.If a direct storage upload fails, SDKs may retry via
POST /ingestion/{ingestion_id}/upload/serverusing the same ingestion, then call complete if the server upload succeeds.If
POST /ingestion/{ingestion_id}/upload/completefails, surface the error to the caller. If the failure is permanent, call abort unless the server already marked the ingestion failed.Signed URLs expire after
expires_inseconds. Request fresh multipart part URLs when retrying a part after expiry.
Last updated