> For the complete documentation index, see [llms.txt](https://docs.marpledata.com/docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.marpledata.com/docs/marple-db/datastreams/uploading-files.md).

# Uploading files

Most users don't need to implement file upload themselves:

* **In the browser**, drag-and-drop files into a stream from the Marple DB UI.
* **In Python**, use `stream.push_file(...)` from the [Python SDK](https://docs.marpledata.com/docs/sdk/overview/python-sdk).&#x20;
* **In the CLI**, use `mdb ingest ...` from the [Rust SDK](https://github.com/marpledata/marple-sdk/blob/main/mdb-cli/README.md)

Implement the REST flow directly only if you're integrating from a language without a Marple SDK. The rest of this section walks through it.

### Upload And Ingestion REST API

This document describes the REST contract for uploading a file into a Marple DB stream and starting ingestion. It is intended for SDK implementations.

All API endpoints below are relative to the Marple DB API base URL, for example `/api/v1`. API requests require normal Marple DB authentication. Direct storage uploads use signed URLs or SAS URLs and must be sent to storage directly, not through the Marple DB API client.

#### Flow Overview

1. Create an ingestion with `POST /ingestion`.
2. Upload the file using the returned `mode`.
3. Mark the upload complete with `POST /ingestion/{ingestion_id}/upload/complete`.
4. If upload cannot be completed, call `POST /ingestion/{ingestion_id}/abort`.

The server decides the upload mode. SDKs must branch on the returned `mode` and should not hard-code their own upload threshold.

#### Create Ingestion

`POST /ingestion`

Request body:

```json
{
  "stream_id": 123,
  "dataset_name": "session/run.csv",
  "file_size": 1048576,
  "metadata": {
    "driver": "Kate"
  }
}
```

Fields:

* `stream_id` is the numeric stream identifier.
* `dataset_name` is the dataset name/path to create in the stream.
* `file_size` is the exact file size in bytes. It must be greater than `0` and no larger than `256 GiB`.
* `metadata` is optional dataset metadata. If omitted or null, it is treated as an empty object.

Successful response:

```json
{
  "dataset_id": 456,
  "ingestion_id": 789,
  "mode": "single",
  "presigned_url": "https://...",
  "part_size": null,
  "expires_in": 3600
}
```

Response fields:

* `dataset_id` identifies the dataset created for this upload.
* `ingestion_id` identifies the ingestion record and is used by all later ingestion endpoints.
* `mode` is one of `server`, `azure`, `single`, or `multipart`.
* `presigned_url` is present for `azure` and `single` uploads.
* `part_size` is present for `multipart` uploads.
* `expires_in` is the lifetime, in seconds, of storage upload URLs returned in this response.

Creating the ingestion creates the dataset and places the ingestion in the `UPLOADING` state.

#### Upload Modes

**`server`**

Use this mode when the server returns `"mode": "server"`, or as a fallback if a direct storage upload fails and the SDK wants to retry through the API server.

`POST /ingestion/{ingestion_id}/upload/server`

Request body:

* `multipart/form-data`
* field name: `file`
* field value: the full file body

Successful response:

```json
{
  "status": "success"
}
```

After the server upload succeeds, call the complete endpoint.

**`azure`**

Use this mode when the server returns `"mode": "azure"`.

Upload the full file to `presigned_url` using Azure Blob Storage semantics. Browser clients can use `BlockBlobClient(presigned_url).uploadData(file)`. Other SDKs should perform an equivalent upload to the SAS URL.

After the Azure upload succeeds, call the complete endpoint.

**`single`**

Use this mode when the server returns `"mode": "single"`.

Upload the full file with one `PUT` request to `presigned_url`.

The `PUT` body must be the raw file bytes. Do not send Marple DB API authentication headers to this storage URL.

After the storage upload succeeds, call the complete endpoint.

**`multipart`**

Use this mode when the server returns `"mode": "multipart"`.

The initial response includes `part_size`. Split the file into 1-based parts using that size:

* `start = (part_number - 1) * part_size`
* `end = min(start + part_size, file_size)`
* the final part may be smaller than `part_size`

Fetch signed URLs for parts with:

`GET /ingestion/{ingestion_id}/upload/part-urls?start_part=1&count=32`

Query parameters:

* `start_part` is required and must be between `1` and the total number of parts.
* `count` is optional. If omitted, the server returns up to `32` parts. It must be greater than `0`.

Successful response:

```json
{
  "parts": [
    {
      "part_number": 1,
      "url": "https://..."
    }
  ],
  "expires_in": 3600,
  "next_part": 2
}
```

Response fields:

* `parts` contains signed S3 upload URLs for the requested range.
* `expires_in` is the lifetime, in seconds, of the returned part URLs.
* `next_part` is the next part number to request, or `null` when all parts have been signed.

Upload each part with `PUT` to its signed `url`. The request body must be exactly that part's raw bytes. Do not send Marple DB API authentication headers to these storage URLs.

SDKs may upload parts concurrently. Use bounded concurrency so large files do not create unbounded requests; the frontend currently uses 4 workers.

After all parts upload successfully, call the complete endpoint. The Marple DB server completes the S3 multipart upload using the uploaded storage parts, verifies the final object size, and starts ingestion.

#### Complete Upload

`POST /ingestion/{ingestion_id}/upload/complete`

Request body: empty.

Successful response:

```json
{
  "status": "success"
}
```

Completion verifies the uploaded object size against the `file_size` given to `POST /ingestion`. If the size differs, the ingestion is marked `FAILED` and the endpoint returns an error.

On success, ingestion starts. The dataset then moves out of `UPLOADING` into the normal ingestion lifecycle.

#### Abort Upload

`POST /ingestion/{ingestion_id}/abort`

Request body:

```json
{
  "reason": "Upload failed: connection reset"
}
```

The body is optional. If no reason is provided, the server records `"Unknown reason"`.

Successful response:

```json
{
  "status": "success"
}
```

Call abort when the SDK cannot finish an upload after creating an ingestion. Abort marks unstable ingestion states as `FAILED` and cancels queued ingestion work when applicable.

#### SDK Error Handling Requirements

SDKs should follow these rules:

* If `POST /ingestion` fails, no ingestion was created and there is nothing to abort.
* If any upload step fails after an `ingestion_id` was returned, either retry safely or call abort.
* If a direct storage upload fails, SDKs may retry via `POST /ingestion/{ingestion_id}/upload/server` using the same ingestion, then call complete if the server upload succeeds.
* If `POST /ingestion/{ingestion_id}/upload/complete` fails, surface the error to the caller. If the failure is permanent, call abort unless the server already marked the ingestion failed.
* Signed URLs expire after `expires_in` seconds. Request fresh multipart part URLs when retrying a part after expiry.