Query

Run a SQL query in a specific BigQuery database.

yaml
type: "io.kestra.plugin.gcp.bigquery.Query"

Examples

Create a table with a custom query.

yaml
id: gcp_bq_query
namespace: company.team

tasks:
  - id: query
    type: io.kestra.plugin.gcp.bigquery.Query
    destinationTable: "my_project.my_dataset.my_table"
    writeDisposition: WRITE_APPEND
    sql: |
      SELECT
        "hello" as string,
        NULL AS `nullable`,
        1 as int,
        1.25 AS float,
        DATE("2008-12-25") AS date,
        DATETIME "2008-12-25 15:30:00.123456" AS datetime,
        TIME(DATETIME "2008-12-25 15:30:00.123456") AS time,
        TIMESTAMP("2008-12-25 15:30:00.123456") AS timestamp,
        ST_GEOGPOINT(50.6833, 2.9) AS geopoint,
        ARRAY(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) AS `array`,
        STRUCT(4 AS x, 0 AS y, ARRAY(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) AS z) AS `struct`

Execute a query and fetch results sets on another task.

yaml
id: gcp_bq_query
namespace: company.team

tasks:
  - id: fetch
    type: io.kestra.plugin.gcp.bigquery.Query
    fetch: true
    sql: |
      SELECT 1 as id, "John" as name
      UNION ALL
      SELECT 2 as id, "Doe" as name
  - id: use_fetched_data
    type: io.kestra.plugin.core.debug.Return
    format: |
      {% for row in outputs.fetch.rows %}
      id : {{ row.id }}, name: {{ row.name }}
      {% endfor %}

Properties

allowLargeResults booleanstring

Sets whether the job is enabled to create arbitrarily large results.

If true the query is allowed to create large results at a slight cost in performance. destinationTable must be provided.

clusteringFields array

SubType string

The clustering specification for the destination table.

createDisposition string

Possible Values

CREATE_IF_NEEDEDCREATE_NEVER

Whether the job is allowed to create tables.

defaultDataset string

Sets the default dataset.

This dataset is used for all unqualified table names used in the query.

destinationTable string

The table where to put query results.

If not provided, a new table is created.

dryRun booleanstring

Default false

Whether the job has to be dry run or not.

A valid query will mostly return an empty response with some processing statistics, while an invalid query will return the same error as it would if it were an actual run.

fetchType string

Default NONE

Possible Values

STOREFETCHFETCH_ONENONE

Fetch type

The way you want to store data :

FETCH_ONE - output the first row
FETCH - output all rows as output variable
STORE - store all rows to a file
NONE - do nothing

flattenResults booleanstring

Default true

Sets whether nested and repeated fields should be flattened.

If set to false, allowLargeResults must be true.

impersonatedServiceAccount string

The GCP service account to impersonate.

jobTimeout string

Format duration

Job timeout.

If this time limit is exceeded, BigQuery may attempt to terminate the job.

labels object

SubType string

The labels associated with this job.

You can use these to organize and group your jobs. Label keys and values can be no longer than 63 characters, can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. Label values are optional. Label keys must start with a letter and each label in the list must have a different key.

legacySql booleanstring

Default false

Whether to use BigQuery's legacy SQL dialect for this query

By default this property is set to false.

location string

The geographic location where the dataset should reside.

This property is experimental and might be subject to change or removed.

See Dataset Location

maxResults integerstring

This is only supported in the fast query path.

The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies.

maximumBillingTier integerstring

Limits the billing tier for this job.

Queries that have resource usage beyond this tier will fail (without incurring a charge). If unspecified, this will be set to your project default.

maximumBytesBilled integerstring

Limits the bytes billed for this job.

Queries that will have bytes billed beyond this limit will fail (without incurring a charge). If unspecified, this will be set to your project default.

priority string

Default INTERACTIVE

Possible Values

INTERACTIVEBATCH

Sets a priority for the query.

projectId string

The GCP project ID.

rangePartitioningEnd integerstring

The end range partitioning, inclusive.

rangePartitioningField string

Range partitioning field for the destination table.

rangePartitioningInterval integerstring

The width of each interval.

rangePartitioningStart integerstring

The start of range partitioning, inclusive.

retryAuto Constant Exponential Random

Automatic retry for retryable BigQuery exceptions.

Some exceptions (especially rate limit) are not retried by default by BigQuery client, we use by default a transparent retry (not the kestra one) to handle this case. The default values are exponential of 5 seconds for a maximum of 15 minutes and ten attempts

retryMessages array

SubType string

Default ["due to concurrent update","Retrying the job may solve the problem","Retrying may solve the problem"]

The messages which would trigger an automatic retry.

Message is tested as a substring of the full message, and is case insensitive.

retryReasons array

SubType string

Default ["rateLimitExceeded","jobBackendError","backendError","internalError","jobInternalError"]

The reasons which would trigger an automatic retry.

schemaUpdateOptions array

SubType string

Possible Values

ALLOW_FIELD_ADDITIONALLOW_FIELD_RELAXATION

Experimental Options allowing the schema of the destination table to be updated as a side effect of the query job.

Schema update options are supported in two cases: * when writeDisposition is WRITE_APPEND;

when writeDisposition is WRITE_TRUNCATE and the destination table is a partition of a table, specified by partition decorators. For normal tables, WRITE_TRUNCATE will always overwrite the schema.

scopes array

SubType string

Default ["https://www.googleapis.com/auth/cloud-platform"]

The GCP scopes to be used.

serviceAccount string

The GCP service account.

sql string

The sql query to run

timePartitioningField string

The time partitioning field for the destination table.

timePartitioningType string

Default DAY

Possible Values

DAYHOURMONTHYEAR

The time partitioning type specification.

useLegacySql booleanstring

Default false

Sets whether to use BigQuery's legacy SQL dialect for this query.

A valid query will return a mostly empty response with some processing statistics, while an invalid query will return the same error it would if it wasn't a dry run.

useQueryCache booleanstring

Sets whether to look for the result in the query cache.

The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. Moreover, the query cache is only available when destinationTable is not set

writeDisposition string

Possible Values

WRITE_TRUNCATEWRITE_TRUNCATE_DATAWRITE_APPENDWRITE_EMPTY

The action that should occur if the destination table already exists.

Outputs

destinationTable Query-DestinationTable

The destination table (if one) or the temporary table created automatically

jobId string

The job id

row object

Map containing the first row of fetched data

Only populated if 'fetchOne' parameter is set to true.

rows array

SubType object

List containing the fetched data

Only populated if 'fetch' parameter is set to true.

size integer

The size of the rows fetch

uri string

Format uri

The uri of store result

Only populated if 'store' is set to true.

Metrics

cache.hit counter

Whether the query result was fetched from the query cache.

duration timer

The time it took for the query to run.

estimated.bytes.processed counter

Unit bytes

The original estimate of bytes processed for the query.

num.child.jobs counter

The number of child jobs executed by the query.

num.dml.affected.rows counter

Unit records

The number of rows affected by a DML statement. Present only for DML statements INSERT, UPDATE or DELETE.

referenced.tables counter

The number of tables referenced by the query.

total.bytes.billed counter

Unit bytes

The total number of bytes billed for the query.

total.bytes.processed counter

Unit bytes

The total number of bytes processed by the query.

total.partitions.processed counter

Unit partitions

The totla number of partitions processed from all partitioned tables referenced in the job.

total.slot.ms counter

The slot-milliseconds consumed by the query.

Definitions

io.kestra.core.models.tasks.retrys.Constant

interval *string

Format duration

type *object

behavior string

Default RETRY_FAILED_TASK

Possible Values

RETRY_FAILED_TASKCREATE_NEW_EXECUTION

maxAttempts integer

Minimum >= 1

maxDuration string

Format duration

warningOnRetry boolean

Default false

io.kestra.core.models.tasks.retrys.Random

maxInterval *string

Format duration

minInterval *string

Format duration

type *object

behavior string

Default RETRY_FAILED_TASK

Possible Values

RETRY_FAILED_TASKCREATE_NEW_EXECUTION

maxAttempts integer

Minimum >= 1

maxDuration string

Format duration

warningOnRetry boolean

Default false

io.kestra.core.models.tasks.retrys.Exponential

interval *string

Format duration

maxInterval *string

Format duration

type *object

behavior string

Default RETRY_FAILED_TASK

Possible Values

RETRY_FAILED_TASKCREATE_NEW_EXECUTION

delayFactor number

maxAttempts integer

Minimum >= 1

maxDuration string

Format duration

warningOnRetry boolean

Default false

io.kestra.plugin.gcp.bigquery.Query-DestinationTable

dataset string

The dataset of the table

project string

The project of the table

table string

The table name

​Query

Query