I have two text files, say file1 and file2, that contain string values. I want to compute the values that are present in file1 and not in file2. The output will be written to another text file say file_output. I am able to write a etl script that uses data frames to do this computation. My question is –
- Should I use AWS Glue Catalog for this operation ?
- If yes, what are the benefits ?
This script will be invoked once every day and the data size is approximately around 100000 entries on both file1 and file2.
Glue Catalog is nothing but your structure of the data. There is no benefit on comparing/ etl operation. Catalog holds the schema of your source/target