What is the best approach to converting a pdf with unstructured data into a json object? What set of tools would you use in order to parse this data?
Of course I can convert the pdf to csv format and I am willing to do so if that will be simpler. I am only trying to process 3 different documents and I don't mind doing some of the work manually. I prefer to use java, but I also know some javascript and a tiny bit of python.
My ultimate goal is to use the json objects to populate a mongodb database and perhaps an elasticSearch index.
Any advice that you can provide would be much appreciated.
The document that I want to analyze is:
http://akccompanioneventresults.com/?moid=351
I would like for the resulting json object to look like this:
{
"dogName" : "My Dolly Two Spots",
"armbandNumber" : 110,
"handler" : "Nancy Muller",
"breed" : "Papillon",
"round" : "Top 20",
"event" : "2019 AKC National Obedience Championship",
"date" : "2019-03-17",
"rings" :
[
{
"ringNumber" : "1",
"exercises" :
[
{
"exerciseName" : "DJ",
"judge" : "J Stephens",
"score" : 0.5
}
{
"exerciseName" : "DJ",
"judge" : "L Hause",
"score" : 1.0
}
{
"exerciseName" : "DR#3",
"judge" : "J Stephens",
"score" : 0.5
}
{
"exerciseName" : "DR#3",
"judge" : "L Hause",
"score" : 1.0
}
{
"exerciseName" : "Misc",
"judge" : "J Stephens",
"score" : 0.0
}
{
"exerciseName" : "Misc",
"judge" : "L Hause",
"score" : 1.0
}
]
}
{
"ringNumber" : "2",
"exercises" :
[
{
"exerciseName" : "CD",
"judge" : "R Withers",
"score" : 1.0
}
{
"exerciseName" : "CD",
"judge" : "V Kinion",
"score" : 0.0
}
{
"exerciseName" : "ROF",
"judge" : "R Withers",
"score" : 0.0
}
{
"exerciseName" : "ROF",
"judge" : "V Kinion",
"score" : 0.0
}
{
"exerciseName" : "Misc",
"judge" : "R Withers",
"score" : 0.0
}
{
"exerciseName" : "Misc",
"judge" : "V Kinion",
"score" : 0.0
}
]
}
{
"ringNumber" : "7",
"exercises" :
[
{
"exerciseName" : "RHJ",
"judge" : "C Wray",
"score" : 0.5
}
{
"exerciseName" : "RHJ",
"judge" : "J Nocilly",
"score" : 0.5
}
{
"exerciseName" : "HF-8",
"judge" : "C Wray",
"score" : 2.0
}
{
"exerciseName" : "HF-8",
"judge" : "J Nocilly",
"score" : 1.5
}
{
"exerciseName" : "Misc",
"judge" : "C Wray",
"score" : 0.0
}
{
"exerciseName" : "Misc",
"judge" : "J Nocilly",
"score" : 0.0
}
]
}
{
"ringNumber" : "8",
"exercises" :
[
{
"exerciseName" : "DR",
"judge" : "B Lee",
"score" : 0.0
}
{
"exerciseName" : "DR",
"judge" : "J Caputa",
"score" : 0.0
}
{
"exerciseName" : "SE",
"judge" : "B Lee",
"score" : 2.0
}
{
"exerciseName" : "SE",
"judge" : "J Caputa",
"score" : 2.0
}
{
"exerciseName" : "Misc",
"judge" : "B Lee",
"score" : 0.0
}
{
"exerciseName" : "Misc",
"judge" : "J Caputa",
"score" : 0.0
}
]
}
]
}...