I am parsing very huge file which is having content similar to:
File input_text.txt:
hello abc
0] Framework table
f1
f2
f3
0] number of entries
randomtext1
0] Test table
t1
t2
t3
0] number of entries
randomtext2
1] Test table
1] Same as framework page table
randomtext3
2] Test table
t4
t5
t6
2] number of entries
randomtext4
3] Test table
t4
t5
t6
3] number of entries
4] Test table
4] Same as framework page table
randomtext5
1] Framework table
f4
f5
f6
1] number of entries
randomtext6
foofoobar
From this file, I want to extract table entries as follows, which should be expected output:
Here is your framework table:
f1
f2
f3
f4
f5
f6
Here is your test table:
t1
t2
t3
1] Same as framework page table
t4
t5
t6
t4
t5
t6
4] Same as framework page table
I cannot read the whole file in an array due to its large size and number of entries. I have written following code using range operator, but it is showing unexpected result:
$input_log_file = "input_text.txt";
open(LOG_FILE, "$input_log_file") or die("Can't open $input_log_file to read. \n");
while (<LOG_FILE>)
{
if (/Framework table/ .. /number of entries/)
{
next if (/Framework table/ || /number of entries/);
push @framework, $_;
}
if (/Test table/ .. /Same as framework page table/)
{
next if (/Test table/);
push @test, $_;
}
# if(/Same as framework page table/)
# {
# next;
# }
if (/Test table/ .. /number of entries/)
{
next if (/Test table/ || /number of entries/);
push @test, $_;
}
}
close(LOG_FILE);
print "\nHere is your framework table:\n";
print @framework;
print "\nHere is your test table:\n";
print @test;
I am not able to understand, how to ‘break’ the range operator to parse the file successfully.
Any help please?
Don’t test for two different ranges with the same starting expression. Merge them into one.
while (<LOG_FILE>) {
if (/Framework table/ .. /number of entries/)
{
next if /Framework table|number of entries/;
push @framework, $_;
}
if (/Test table/ .. /Same as framework page table|number of entries/)
{
next if /Test table|number of entries/;
push @test, $_;
}
}
Also, instead of /Regex1/ || /Regex2/
, use the regex alternative /Regex1|Regex2/
.
Which OS and version? The file size is in GBs or TBs ? Any unique identifier for rows to split the file into multiple files ?