Querying

Querying Marple DB outside from Marple Insight is available to the following plans:

Marple DB VPC
Marple DB self-managed

Why query Marple DB?

Querying Marple DB directly allows you to execute advanced use cases:

Train machine learning models
Calculate complex aggregates
Generate standardised reports
... and much more

By running these types of analysis on Marple DB, they can be 10-100x faster than traditional scripting. In those cases, the extract-load-transform (ELT) often makes it unfeasible both in terms of run time and computational demand.

By querying Marple DB, you get more value from data that is already collected, cleaned, standarised and optimised for time series operations.

Hot storage (Postgres)

Request your database credentials from our support team.

Structure

The hot storage is organised into three schemas:

_mdb contains internal bookkeeping. Touching this might cause malfunction
mdb_data has the actual time series data of heated signals. Should also not be used directly for most use cases
public contains three tables per datapool (called default in most cases)
- _dataset with file and metadata info
- _signal with signal info and metadata
- _signal_enum guarantees uniqueness of signal names

Querying

You can use any Postgres-compatible tool for sending queries.

For reading time series data, a premade function makes it a lot easier to get data:

-- getting data from datapool 'default'
SELECT * 
FROM mdb_default_data(
    'file.csv', -- or alternatively, use dataset_id
    'speed', -- or alternatively, usesignal_id 
    0, # start timestamp (nanoseconds)
    279112 # end timestamp (nanoseconds)
)

Cold storage (Parquet)

Request your blob storage credentials from our support team.

Structure

The cold storage follows a directory structure with Datapool > Dataset > Signal > Parquet file :

This contains the dataset name and signal names, but no other metadata.

The actual data is stored in parquet files that contain the raw time series data for one signal. If a signal contains a lot of data, it might be split across multiple parquet files (mdb_0.parquet, mdb_1.parquet, ...) that each contain a different time slice.

Querying

Querying can be done by downloading individual Parquet files, or by writing queries in Duck DB (or similar tools).

PreviousSignal Aliasing NextOverview

Last updated 3 months ago