Run a SQL query in a specific BigQuery database.
type: "io.kestra.plugin.gcp.bigquery.Query"
Examples
Create a table with a custom query.
id: gcp_bq_query
namespace: company.team
tasks:
- id: query
type: io.kestra.plugin.gcp.bigquery.Query
destinationTable: "my_project.my_dataset.my_table"
writeDisposition: WRITE_APPEND
sql: |
SELECT
"hello" as string,
NULL AS `nullable`,
1 as int,
1.25 AS float,
DATE("2008-12-25") AS date,
DATETIME "2008-12-25 15:30:00.123456" AS datetime,
TIME(DATETIME "2008-12-25 15:30:00.123456") AS time,
TIMESTAMP("2008-12-25 15:30:00.123456") AS timestamp,
ST_GEOGPOINT(50.6833, 2.9) AS geopoint,
ARRAY(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) AS `array`,
STRUCT(4 AS x, 0 AS y, ARRAY(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) AS z) AS `struct`
Execute a query and fetch results sets on another task.
id: gcp_bq_query
namespace: company.team
tasks:
- id: fetch
type: io.kestra.plugin.gcp.bigquery.Query
fetch: true
sql: |
SELECT 1 as id, "John" as name
UNION ALL
SELECT 2 as id, "Doe" as name
- id: use_fetched_data
type: io.kestra.plugin.core.debug.Return
format: |
{% for row in outputs.fetch.rows %}
id : {{ row.id }}, name: {{ row.name }}
{% endfor %}
Properties
allowLargeResults booleanstring
Sets whether the job is enabled to create arbitrarily large results.
If true
the query is allowed to create large results at a slight cost in performance. destinationTable
must be provided.
clusteringFields array
The clustering specification for the destination table.
createDisposition string
CREATE_IF_NEEDED
CREATE_NEVER
Whether the job is allowed to create tables.
defaultDataset string
Sets the default dataset.
This dataset is used for all unqualified table names used in the query.
destinationTable string
The table where to put query results.
If not provided, a new table is created.
dryRun booleanstring
false
Whether the job has to be dry run or not.
A valid query will mostly return an empty response with some processing statistics, while an invalid query will return the same error as it would if it were an actual run.
fetchType string
NONE
STORE
FETCH
FETCH_ONE
NONE
Fetch type
The way you want to store data :
- FETCH_ONE - output the first row
- FETCH - output all rows as output variable
- STORE - store all rows to a file
- NONE - do nothing
flattenResults booleanstring
true
Sets whether nested and repeated fields should be flattened.
If set to false
, allowLargeResults must be true
.
impersonatedServiceAccount string
The GCP service account to impersonate.
jobTimeout string
duration
Job timeout.
If this time limit is exceeded, BigQuery may attempt to terminate the job.
labels object
The labels associated with this job.
You can use these to organize and group your jobs. Label keys and values can be no longer than 63 characters, can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. Label values are optional. Label keys must start with a letter and each label in the list must have a different key.
legacySql booleanstring
false
Whether to use BigQuery's legacy SQL dialect for this query
By default this property is set to false.
location string
The geographic location where the dataset should reside.
This property is experimental and might be subject to change or removed.
See Dataset Location
maxResults integerstring
This is only supported in the fast query path.
The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies.
maximumBillingTier integerstring
Limits the billing tier for this job.
Queries that have resource usage beyond this tier will fail (without incurring a charge). If unspecified, this will be set to your project default.
maximumBytesBilled integerstring
Limits the bytes billed for this job.
Queries that will have bytes billed beyond this limit will fail (without incurring a charge). If unspecified, this will be set to your project default.
priority string
INTERACTIVE
INTERACTIVE
BATCH
Sets a priority for the query.
projectId string
The GCP project ID.
rangePartitioningEnd integerstring
The end range partitioning, inclusive.
rangePartitioningField string
Range partitioning field for the destination table.
rangePartitioningInterval integerstring
The width of each interval.
rangePartitioningStart integerstring
The start of range partitioning, inclusive.
retryAuto Non-dynamicConstantExponentialRandom
Automatic retry for retryable BigQuery exceptions.
Some exceptions (especially rate limit) are not retried by default by BigQuery client, we use by default a transparent retry (not the kestra one) to handle this case. The default values are exponential of 5 seconds for a maximum of 15 minutes and ten attempts
retryMessages array
["due to concurrent update","Retrying the job may solve the problem","Retrying may solve the problem"]
The messages which would trigger an automatic retry.
Message is tested as a substring of the full message, and is case insensitive.
retryReasons array
["rateLimitExceeded","jobBackendError","backendError","internalError","jobInternalError"]
The reasons which would trigger an automatic retry.
schemaUpdateOptions array
ALLOW_FIELD_ADDITION
ALLOW_FIELD_RELAXATION
Experimental Options allowing the schema of the destination table to be updated as a side effect of the query job.
Schema update options are supported in two cases: * when writeDisposition is WRITE_APPEND;
- when writeDisposition is WRITE_TRUNCATE and the destination table is a partition of a table, specified by partition decorators. For normal tables, WRITE_TRUNCATE will always overwrite the schema.
scopes array
["https://www.googleapis.com/auth/cloud-platform"]
The GCP scopes to be used.
serviceAccount string
The GCP service account.
sql string
The sql query to run
timePartitioningField string
The time partitioning field for the destination table.
timePartitioningType string
DAY
DAY
HOUR
MONTH
YEAR
The time partitioning type specification.
useLegacySql booleanstring
false
Sets whether to use BigQuery's legacy SQL dialect for this query.
A valid query will return a mostly empty response with some processing statistics, while an invalid query will return the same error it would if it wasn't a dry run.
useQueryCache booleanstring
Sets whether to look for the result in the query cache.
The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. Moreover, the query cache is only available when destinationTable
is not set
writeDisposition string
WRITE_TRUNCATE
WRITE_TRUNCATE_DATA
WRITE_APPEND
WRITE_EMPTY
The action that should occur if the destination table already exists.
Outputs
destinationTable Query-DestinationTable
The destination table (if one) or the temporary table created automatically
jobId string
The job id
row object
Map containing the first row of fetched data
Only populated if 'fetchOne' parameter is set to true.
rows array
List containing the fetched data
Only populated if 'fetch' parameter is set to true.
size integer
The size of the rows fetch
uri string
uri
The uri of store result
Only populated if 'store' is set to true.
Metrics
cache.hit counter
Whether the query result was fetched from the query cache.
duration timer
The time it took for the query to run.
estimated.bytes.processed counter
bytes
The original estimate of bytes processed for the query.
num.child.jobs counter
The number of child jobs executed by the query.
num.dml.affected.rows counter
records
The number of rows affected by a DML statement. Present only for DML statements INSERT, UPDATE or DELETE.
referenced.tables counter
The number of tables referenced by the query.
total.bytes.billed counter
bytes
The total number of bytes billed for the query.
total.bytes.processed counter
bytes
The total number of bytes processed by the query.
total.partitions.processed counter
partitions
The totla number of partitions processed from all partitioned tables referenced in the job.
total.slot.ms counter
The slot-milliseconds consumed by the query.
Definitions
io.kestra.core.models.tasks.retrys.Constant
interval *Requiredstring
duration
type *Requiredobject
behavior string
RETRY_FAILED_TASK
RETRY_FAILED_TASK
CREATE_NEW_EXECUTION
maxAttempts integer
>= 1
maxDuration string
duration
warningOnRetry boolean
false
io.kestra.core.models.tasks.retrys.Random
maxInterval *Requiredstring
duration
minInterval *Requiredstring
duration
type *Requiredobject
behavior string
RETRY_FAILED_TASK
RETRY_FAILED_TASK
CREATE_NEW_EXECUTION
maxAttempts integer
>= 1
maxDuration string
duration
warningOnRetry boolean
false
io.kestra.core.models.tasks.retrys.Exponential
interval *Requiredstring
duration
maxInterval *Requiredstring
duration
type *Requiredobject
behavior string
RETRY_FAILED_TASK
RETRY_FAILED_TASK
CREATE_NEW_EXECUTION
delayFactor number
maxAttempts integer
>= 1
maxDuration string
duration
warningOnRetry boolean
false
io.kestra.plugin.gcp.bigquery.Query-DestinationTable
dataset string
The dataset of the table
project string
The project of the table
table string
The table name