I have an huge file which can have lines in below two formats:
Format1:
*1 <int_1/string_1>:<int/string> <int_2/string_2>:<int/string> <float>
Format2:
*1 <int/string>:<int/string> <float>
So, possible cases for above format are:
*1 1:2 3:4 2.3
*1 1:foo 3:bar 2.3
*1 foo:1 bar:4 2.3
*1 foo:foo bar:bar 2.3
*1 foo:foo 2.3
From both of above format lines, I only need to consider 'Format1' for my code. While reading that huge file, skip the lines respective to 'Format2'. In possible cases, I will consider first 4 cases, not the last one since it matches to 'Format2'. So, regex should be something like this:
(\d+)(\s+)(\\*\S+:\S+)(\s+)(\\*\S+:\S+)(\s+)(\d+)
where
\d is any digit. \d+ is more than 1 digit.
\s is space. \s+ is more than 1 space.
\S is anything non-space. \S+ is anything more than 1 non-space.
After considering the 'Format1' line, I will have to take two values from it:
int_1/string_1
int_2/string_2
What could have you done optimally to deal with it?