Query

Query a DuckDB database.

yaml
type: "io.kestra.plugin.jdbc.duckdb.Query"

Examples

Query multiple CSV files from a ZIP file, include the filename in the result, and output both the final dataset in ION format and the DuckDB database file.

yaml
id: query_multiple_csv_files
namespace: company.team

tasks:
  - id: get_zip_file
    type: io.kestra.plugin.core.http.Download
    uri: https://huggingface.co/datasets/kestra/datasets/resolve/main/zip/2023-01.zip

  - id: unzip
    type: io.kestra.plugin.compress.ArchiveDecompress
    algorithm: ZIP
    from: "{{outputs.get_zip_file.uri}}"

  - id: duckdb
    type: io.kestra.plugin.jdbc.duckdb.Query
    inputFiles: "{{outputs.unzip.files}}"
    sql: SELECT * FROM read_csv_auto('**/*-outcomes.csv', union_by_name=true, filename=true);
    store: true  # output data in ION format
    outputDbFile: true  # output the DuckDB database file

Execute a query that reads a CSV file, and outputs another CSV file.

yaml
id: query_duckdb
namespace: company.team

tasks:
  - id: http_download
    type: io.kestra.plugin.core.http.Download
    uri: "https://huggingface.co/datasets/kestra/datasets/raw/main/csv/orders.csv"

  - id: query
    type: io.kestra.plugin.jdbc.duckdb.Query
    url: 'jdbc:duckdb:'
    timeZoneId: Europe/Paris
    sql: |-
      CREATE TABLE new_tbl AS SELECT * FROM read_csv_auto('data.csv', header=True);

      COPY (SELECT order_id, customer_name FROM new_tbl) TO '{{ outputFiles.out }}' (HEADER, DELIMITER ',');
    inputFiles:
      data.csv: "{{ outputs.http_download.uri }}"
    outputFiles:
       - out

Execute a query that reads from an existing database file using a URL.

yaml
id: query_duckdb
namespace: company.team

inputs:
  - id: my_db
    type: FILE

tasks:
  - id: query1
    type: io.kestra.plugin.jdbc.duckdb.Query
    databaseUri: "{{ inputs.my_db }}"
    sql: SELECT * FROM table_name;
    fetchType: STORE

Run a SQL query with DuckDB on MotherDuck and get the result as a CSV file

yaml
id: motherduck
namespace: company.team

tasks:
  - id: query
    type: io.kestra.plugin.jdbc.duckdb.Query
    sql: |
      SELECT by, COUNT(*) as nr_comments
      FROM sample_data.hn.hacker_news
      GROUP BY by
      ORDER BY nr_comments DESC;
    fetchType: STORE

  - id: csv
    type: io.kestra.plugin.serdes.csv.IonToCsv
    from: "{{ outputs.query.uri }}"

pluginDefaults:
  - type: io.kestra.plugin.jdbc.duckdb.Query
    values:
      url: jdbc:duckdb:md:my_db?motherduck_token={{ secret('MOTHERDUCK_TOKEN') }}
      timeZoneId: Europe/Berlin

Properties

databaseUri string

Database URI

Kestra's URI to an existing Duck DB database file

fetchSize integerstring

Default 10000

Number of rows that should be fetched.

Gives the JDBC driver a hint as to the number of rows that should be fetched from the database when more rows are needed for this ResultSet object. If the fetch size specified is zero, the JDBC driver ignores the value and is free to make its own best guess as to what the fetch size should be. Ignored if autoCommit is false.

fetchType string

Default NONE

Possible Values

STOREFETCHFETCH_ONENONE

The way you want to store data.

FETCH_ONE - output the first row. FETCH - output all rows as output variable. STORE - store all rows to a file. NONE - do nothing.

inputFiles object

SubType string

Input files to be loaded from DuckDb.

Describe a files map that will be written and usable by DuckDb. You can reach files by their filename, example: SELECT * FROM read_csv_auto('myfile.csv');

outputDbFile booleanstring

Default false

Output the database file.

This property lets you define if you want to output the in-memory database as a file for further processing.

outputFiles array

SubType string

Output file list that will be uploaded to internal storage.

List of keys that will generate temporary files. On the SQL query, you can just use a variable named outputFiles.key for the corresponding file. If you add a file with ["first"], you can use the special vars COPY tbl TO '{{ outputFiles.first }}' (HEADER, DELIMITER ','); and use this file in others tasks using {{ outputs.taskId.outputFiles.first }}.

parameters object

Parameters

A map of parameters to bind to the SQL queries. The keys should match the parameter placeholders in the SQL string, e.g., : parameterName.

password string

The database user's password.

sql string

The SQL query to run.

timeZoneId string

The time zone id to use for date/time manipulation. Default value is the worker's default time zone id.

url string

Default jdbc:duckdb:

The JDBC URL to connect to the database.

The default value, jdbc: duckdb: , will use a local in-memory database. Set this property when connecting to a persisted database instance, for example jdbc: duckdb: md: my_database?motherduck_token=<my_token> to connect to MotherDuck.

username string

The database user.

Outputs

databaseUri string

Format uri

The database output URI in Kestra's internal storage.

outputFiles object

SubType string

The output files' URI in Kestra's internal storage.

row object

Map containing the first row of fetched data.

Only populated if fetchOne parameter is set to true.

rows array

SubType object

List of map containing rows of fetched data.

Only populated if fetch parameter is set to true.

size integer

The number of rows fetched.

Only populated if store or fetch parameter is set to true.

uri string

Format uri

The URI of the result file on Kestra's internal storage (.ion file / Amazon Ion formatted text file).

Only populated if store is set to true.

​Query

Query