How to sequentially run a single pipeline in apache beam that write and read data

Question

Im using apache beam and python for using it on dataflow (gcp) to load and transform data baches.

I have a pipeline that its separated into different parts.
The first part, write into bigquery.
The second part read the data written in the first step.

I’m using the same p pipeline to run this, so both parts will run at the same time. I have to achieve the sequentially running in just one pipeline. I tried to add some flag to trigger the second part, but as WriteToBigQuery doesnt return nothing, and I cannot make it iterable, so I was unable to achieve it.

p = beam.Pipeline(options=opts)
part_1 = (
    p
    | "F1: Read data 1"
    >> beam.io.ReadFromText(
        entrada, skip_header_lines=True
    )
    | "F1: Transform 1" >> beam.Map(format_date)
    | "F1: Transform 2" >> Map(make_row)
    | "F1: Write into BQ"
    >> WriteToBigQuery(
        output_table,
        schema=table_schema,
        write_disposition=BigQueryDisposition.WRITE_APPEND,
        create_disposition=BigQueryDisposition.CREATE_IF_NEEDED,
        additional_bq_parameters={
            "timePartitioning": {"type": "DAY"},
            "clustering": {"fields": ["programcode"]},
        },
        custom_gcs_temp_location=temp_location
    )
)

After this execution I read the data written (I use a query that includes another tables besides this), this way:

part_2 = (
p
| "F2: Read from BigQuery from first step"
>> beam.io.ReadFromBigQuery(
    query=query_raw,
    use_standard_sql=True,
    gcs_location=temp_location,
    project=project_id
)
)

and finally run the defined pipeline:

result = p.run()
result.wait_until_finish()

If I add part_1 as an input in part_2 like this:

part_2 = (
part_1
| "F2: Read from BigQuery"...

The output its the following because part_1 output:

AttributeError: Error trying to access nonexistent attribute `0` in write result. Please see __documentation__ for available attributes.

Any ideas or examples to achieve this sequentially are welcome to discuss.

Leave a Comment Cancel reply