I have a simple tsv
file with the following structure:
0 - headerline
1 - empty line
2 - PIG schema
3 - empty line
4 - 1-st line of DATA
5 - 2-nd line of DATA
I would like to read it, possibly using readr::read_tsv
but here is the problem.
As you can see, the first row contains the headers. Then I have three rows that I do NOT want to read it (they contains some super weird data coming from Apache PIG), and at row 4 the data starts. In Pandas
, I would do something like
df = pd.read_csv('/localpath/data.tsv', sep='\t', skiprows=[1,2,3])
which allows me to read the headers AND to skip row one, two, three.
I don't see a similar option in readr::read_tsv
. That is :
df = read_tsv('/localpath/data.tsv', col_names = TRUE, skip = 4)
which does not parse the headers...
Any ideas?