AWS Glue Catalog to compute diff between two text (txt) files

I have two text files, say file1 and file2, that contain string values. I want to compute the values that are present in file1 and not in file2. The output will be written to another text file say file_output. I am able to write a etl script that uses data frames to do this computation. My question is –

  1. Should I use AWS Glue Catalog for this operation ?
  2. If yes, what are the benefits ?

This script will be invoked once every day and the data size is approximately around 100000 entries on both file1 and file2.

  • Glue Catalog is nothing but your structure of the data. There is no benefit on comparing/ etl operation. Catalog holds the schema of your source/target

    – 

Leave a Comment