# Querying

If your license allows this, you can directly connect to all the cold data in Marple DB via [Trino](https://trino.io/) and query this data using SQL.

This allows you to connect any[ Trino compatible tool](https://trino.io/ecosystem/client-application) to your data. The following guides help you to connect with some common tools. Check the Connect page in Marple DB for the values relevant to your workspace.

<details>

<summary>Grafana</summary>

* In Grafana, go to Configuration > Data Sources > Add data source
* Search for and install the "Trino" data source plugin
* Configure the data source:
  * Host `https://query.db.<your-vpc-name>.marpledata.com:443`&#x20;
  * User `mdb_<workspace-id>`
  * Password `<your-api-token>`
  * Catalog `mdb_<workspace-id>_hot`
* Click "Save & Test"

</details>

<details>

<summary>Power BI</summary>

* Download and install the [Trino ODBC driver](https://trino.io/docs/current/client/jdbc.html)
* In Power BI Desktop, click "Get Data" > "ODBC"
* Configure the connection:
  * Connection string `Driver={Trino};Host=query.db.<your-vpc-name>.marpledata.com;Port=443;SSL=1`&#x20;
  * User `mdb_<workspace-id>`
  * Password `<your-api-token>`
* Select tables or write custom SQL queries

</details>

<details>

<summary>Metabase</summary>

* In Metabase, go to Admin > Databases > Add database
* Select "Starburst (Trino)" as the database type
* Configure the connection:
  * Host `query.db.<your-vpc-name>.marpledata.com`
  * Port `443`
  * Catalog `mdb_<workspace-id>_hot`
  * Username `mdb_>workspace-id>`
  * Password `<your-api-token>`
* Enable "Use a secure connection (SSL)"
* Click "Save" to test and save the connection

</details>

<details>

<summary>Python</summary>

* Install the [trino](https://pypi.org/project/trino/) package: `pip install trino`
* Connect using the following code:

  ```python
  from trino.dbapi import connect
  from trino.auth import BasicAuthentication
  conn = connect(
      host="query.db.<your-vpc-name>.marpledata.com",
      port=443,
      user="mdb_<workspace-id>",
      auth=BasicAuthentication("mdb_<workpace-id>", "<your-api-token>"),    
      catalog="mdb_<workspace-id>_hot",
      http_scheme="https",
  )
  cursor = conn.cursor()
  cursor.execute("SELECT * FROM mdb_nightly_hot.public.mdb_default_dataset LIMIT 10")
  rows = cursor.fetchall()
  print(rows)
  ```

</details>

<details>

<summary>DBeaver</summary>

* Download and install [DBeaver](https://dbeaver.io/download/)
* Go to Database > New Database Connection
* Search for and select "Trino"
* Enter connection details:
  * Host `query.db.<your-vpc-name>.marpledata.com`
  * Port `443`
  * Username `mdb_<workspace-id>`
  * Password `<your-api-token>`&#x20;
* Click "Test Connection" to verify, then "Finish"

</details>

<details>

<summary>Other</summary>

Trino Clients & Documentation:

<https://trino.io/docs/current/client>

* JDBC URL `jdbc:trino://query.db.<your-vpc-name>.marpledata.com:443`
* Host `query.db.<your-vpc-name>.marpledata.com`
* Port `443`
* Username `mdb_<workspace-id>`
* Hot Catalog

  `mdb_<workspace-id>_hot`
* Cold Catalog

  `mdb_<workspace-id>_cold`
* Password

  `<your-api-token>`

</details>

#### Structure

Trino allows querying the hot and cold data in Marple DB as one unified database. To access this data, the correct schema should be specified

* **Hot** (postgres) - All high level metadata about datasets and signals lives in these tables:

  * `mdb_<datapool>_dataset`  information about all individual datasets in a datapool
  * `mdb_<datapool>_signal` information about all individual signals in a datapool
  * `mdb_<datapool>_signal_enum` mapping between `name` and `id`  for all signals in a datapool

  -> schema: `mdb_<workspace-id>_hot`
* **Cold** (parquet data lake in [Apache Iceberg format](https://iceberg.apache.org/)) - All raw data in a unified table per datapool with columns `dataset`, `signal` , `time`, `value`, `value_signal`&#x20;

  -> schema: `mdb_<workspace-id>_cold`

#### Example Queries

List datasets in the hot catalog (Postgres)

```sql
--- Get dataset info from hot catalog
SELECT
   id, stream_id, path, created_at, created_by, backup_path,
   n_signals, n_datapoints, timestamp_start, timestamp_stop,  
   JSON_EXTRACT_SCALAR(metadata, '$.metadata_key') as metadata_key -- get a specific metadata key
FROM mdb_<workspace-id>_hot.public.mdb_<datapool>_dataset
LIMIT 10
```

Select the signal id for a given signal name from the hot catalog (Postgres)

```sql
--- Get signal id from hot catalog
SELECT id
FROM mdb_<workspace-id>_hot.public.mdb_<datapool>_signal_enum
WHERE name = {signal_name}
```

Get signal info from the hot catalog

```sql
--- Get signal info from hot catalog
SELECT
   name, unit, description, metadata, storage_status,
   time_min, time_max, count,
   JSON_EXTRACT_SCALAR(stats, '$.frequency') as frequency -- get a specific stat key
FROM mdb_<workspace-id>_hot.public.mdb_<datapool>_signal
WHERE dataset_id = {dataset_id}
AND signal_id = {signal_id}
```

Query raw data from the cold catalog (Iceberg)

```sql
-- Get signal data from Iceberg
SELECT time, value, value_text
FROM mdb_<workspace-id>_cold.default.data
WHERE dataset = {dataset_id}
AND signal = {signal_id}
LIMIT 1000
```
