# Querying

If your license allows this, you can directly connect to all the cold data in Marple DB via [Trino](https://trino.io/) and query this data using SQL.

This allows you to connect any[ Trino compatible tool](https://trino.io/ecosystem/client-application) to your data. The following guides help you to connect with some common tools. Check the Connect page in Marple DB for the values relevant to your workspace.

<details>

<summary>Grafana</summary>

* In Grafana, go to Configuration > Data Sources > Add data source
* Search for and install the "Trino" data source plugin
* Configure the data source:
  * Host `https://query.db.<your-vpc-name>.marpledata.com:443`&#x20;
  * User `mdb_<workspace-id>`
  * Password `<your-api-token>`
  * Catalog `mdb_<workspace-id>_hot`
* Click "Save & Test"

</details>

<details>

<summary>Power BI</summary>

* Download and install the [Trino ODBC driver](https://trino.io/docs/current/client/jdbc.html)
* In Power BI Desktop, click "Get Data" > "ODBC"
* Configure the connection:
  * Connection string `Driver={Trino};Host=query.db.<your-vpc-name>.marpledata.com;Port=443;SSL=1`&#x20;
  * User `mdb_<workspace-id>`
  * Password `<your-api-token>`
* Select tables or write custom SQL queries

</details>

<details>

<summary>Metabase</summary>

* In Metabase, go to Admin > Databases > Add database
* Select "Starburst (Trino)" as the database type
* Configure the connection:
  * Host `query.db.<your-vpc-name>.marpledata.com`
  * Port `443`
  * Catalog `mdb_<workspace-id>_hot`
  * Username `mdb_>workspace-id>`
  * Password `<your-api-token>`
* Enable "Use a secure connection (SSL)"
* Click "Save" to test and save the connection

</details>

<details>

<summary>Python</summary>

* Install the [trino](https://pypi.org/project/trino/) package: `pip install trino`
* Connect using the following code:

  ```python
  from trino.dbapi import connect
  from trino.auth import BasicAuthentication
  conn = connect(
      host="query.db.<your-vpc-name>.marpledata.com",
      port=443,
      user="mdb_<workspace-id>",
      auth=BasicAuthentication("mdb_<workpace-id>", "<your-api-token>"),    
      catalog="mdb_<workspace-id>_hot",
      http_scheme="https",
  )
  cursor = conn.cursor()
  cursor.execute("SELECT * FROM mdb_nightly_hot.public.mdb_default_dataset LIMIT 10")
  rows = cursor.fetchall()
  print(rows)
  ```

</details>

<details>

<summary>DBeaver</summary>

* Download and install [DBeaver](https://dbeaver.io/download/)
* Go to Database > New Database Connection
* Search for and select "Trino"
* Enter connection details:
  * Host `query.db.<your-vpc-name>.marpledata.com`
  * Port `443`
  * Username `mdb_<workspace-id>`
  * Password `<your-api-token>`&#x20;
* Click "Test Connection" to verify, then "Finish"

</details>

<details>

<summary>Other</summary>

Trino Clients & Documentation:

<https://trino.io/docs/current/client>

* JDBC URL `jdbc:trino://query.db.<your-vpc-name>.marpledata.com:443`
* Host `query.db.<your-vpc-name>.marpledata.com`
* Port `443`
* Username `mdb_<workspace-id>`
* Hot Catalog

  `mdb_<workspace-id>_hot`
* Cold Catalog

  `mdb_<workspace-id>_cold`
* Password

  `<your-api-token>`

</details>

#### Structure

Trino allows querying the hot and cold data in Marple DB as one unified database. To access this data, the correct schema should be specified

* **Hot** (postgres) - All high level metadata about datasets and signals lives in these tables:

  * `mdb_<datapool>_dataset`  information about all individual datasets in a datapool
  * `mdb_<datapool>_signal` information about all individual signals in a datapool
  * `mdb_<datapool>_signal_enum` mapping between `name` and `id`  for all signals in a datapool

  -> schema: `mdb_<workspace-id>_hot`
* **Cold** (parquet data lake in [Apache Iceberg format](https://iceberg.apache.org/)) - All raw data in a unified table per datapool with columns `dataset`, `signal` , `time`, `value`, `value_signal`&#x20;

  -> schema: `mdb_<workspace-id>_cold`

#### Example Queries

List datasets in the hot catalog (Postgres)

```sql
--- Get dataset info from hot catalog
SELECT
   id, stream_id, path, created_at, created_by, backup_path,
   n_signals, n_datapoints, timestamp_start, timestamp_stop,  
   JSON_EXTRACT_SCALAR(metadata, '$.metadata_key') as metadata_key -- get a specific metadata key
FROM mdb_<workspace-id>_hot.public.mdb_<datapool>_dataset
LIMIT 10
```

Select the signal id for a given signal name from the hot catalog (Postgres)

```sql
--- Get signal id from hot catalog
SELECT id
FROM mdb_<workspace-id>_hot.public.mdb_<datapool>_signal_enum
WHERE name = {signal_name}
```

Get signal info from the hot catalog

```sql
--- Get signal info from hot catalog
SELECT
   name, unit, description, metadata, storage_status,
   time_min, time_max, count,
   JSON_EXTRACT_SCALAR(stats, '$.frequency') as frequency -- get a specific stat key
FROM mdb_<workspace-id>_hot.public.mdb_<datapool>_signal
WHERE dataset_id = {dataset_id}
AND signal_id = {signal_id}
```

Query raw data from the cold catalog (Iceberg)

```sql
-- Get signal data from Iceberg
SELECT time, value, value_text
FROM mdb_<workspace-id>_cold.default.data
WHERE dataset = {dataset_id}
AND signal = {signal_id}
LIMIT 1000
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.marpledata.com/docs/marple-db/querying.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
