Looping using foreach in R ❤️ {UPDATED 2023}

I am trying to use foreach loop to split a big csv file into small files after making minor cleaning. My strategy was to
1- Use read_csv to read a chunk that fits into RAM
2- Do the cleaning
3- Save the file into the new format
4- Repeat 1 to 3 using skip and n_max arguments so I can skip what I have read and read an equal amount of rows

I was looping over a sequence that starts from zero and increases with increments of chunk size.

This was working with the normal for loop, and I was able to use break if something wrong happened, but I could not make it work for the parallel foreach. It does not save files properly. My assumption was that by using foreach, for instance, a core will work on chunk from 0 to 100,000 and another core works on 100,001 to 200,000 and so on.

1

Can you include the code you have tried so far, and a dput() of the first few rows of your data? Thanks

–
also: describe (if possible) the cleaning process? It is simple filtering? If so, then you could possibly filter before reading, using something like data.tabke:fread() with a grep-like cmd-argument.

–

Leave a Comment Cancel reply