Big Query – createQueryJob with dynamic projectId values

I am using the @google-cloud/bigquery npm package. I have various projects set up in Big Query, and I would like to:

  1. Create one single instance of Big Query, without passing any specific projectId when initializing
  2. Run the same query for a list of Big Query projects, considering that both the projectId and the dataset will be passed dynamically, at runtime, and not when Big Query is initialized.
  3. Write the data obtained with this query, into the same project for which it was run, but in a targetDataset and targetTable that I pass in the destination

I have tried different things:

  1. This job runs without issued, for the correct projectId, source dataset and the table that I mention in the string of the query, however this is not a write job:
    public async getQueryResults(query: string, projectId: string, sourceDataSet: string, targetDataset: string, targetTable: string) {
        const myJob = {
            query,
            defaultDataset: {
                datasetId: sourceDataSet,
                projectId
            }
        };
        const jobResult = await this.bigQueryDatabase.query(myJob);
        console.log(myJob, 'myJob');
    }
  1. The below does create a write job, but I am getting an error:
public async getQueryResults(query: string, projectId: string, sourceDataSet: string, targetDataset: string, targetTable: string) {
    const destinationTable = await this.bigQueryDatabase.dataset(targetDataset, { projectId }).table(targetTable);
    const myJob = {
            query,
            destination: destinationTable, // this is a Table Object
            useLegacySql: /#legacySQL/i.test(query),
            writeDisposition: 'WRITE_APPEND',
            createDisposition: 'CREATE_IF_NEEDED',
            priority: 'BATCH',
            defaultDataset: {
                datasetId: sourceDataSet,
                projectId
            }
        };
        const jobResult = await this.bigQueryDatabase.query(myJob);
        console.log(myJob, 'myJob');
    }

and the error is something like

Error: Not found: Dataset differentProjectId:myDataset was not found in location US

With the projectId being different than the ones I have passed as a parameter, which leads me to believe that somehow, Big Query is trying to run this query in all of my projects(instead of the one I have passed in the destinationTable parameters). The dataset however, seems to be correctly picked up.

  1. Lastly, I have also tried the following:
public async getQueryResults(query: string, projectId: string, sourceDataSet: string, targetDataset: string, targetTable: string) {
        console.log(query, projectId, sourceDataSet, targetDataset, targetTable);
        const myJob = {
            query,
            destinationTable: {
                projectId,
                datasetId: targetDataset,
                tableId: targetTable
            },
            useLegacySql: /#legacySQL/i.test(query),
            writeDisposition: 'WRITE_APPEND',
            createDisposition: 'CREATE_IF_NEEDED',
            priority: 'BATCH',
            defaultDataset: {
                datasetId: sourceDataSet,
                projectId
            }
        };
        const jobResult = await this.bigQueryDatabase.query(myJob);
        console.log('Query Results:', jobResult);
    }

And this one runs, but strangely enough, it seems to be running the query for all the tables in the source dataset. For example, my query is

SELECT 'users' as tableName name FROM 'users' ....;

But the Query Results printed in the console are:

[
    {
      estimationtime: [BigQueryTimestamp],
      tableName: '${sourceDataset}.'users',
      sampleName: 'users',
      value: 5
    },
    {
      estimationtime: [BigQueryTimestamp],
      tableName: '${sourceDataset}.accounts',
      sampleName: 'accountType', //this is a column in my DB
      value: 3
    },
....
]

However, it seems to only write the correct value in the actual table.

Any idea what I should change so that I have a write job that

  • executes a query
  • on a given source projectId and dataset
  • and then writes the results of that query in a target table, in a target dataset, in a target projectId (incidentally, in my case, the source projectId and destination projectId are identical, but they could potentially be different)

Leave a Comment