0

I would like to parse the content of a wikipedia page, but I do miss something which I do not understand. Can someone help me ?

Example: I have a wikipedia page:

https://it.wikipedia.org/wiki/Anni_690_a.C.

In this page a chinese politican is mentoined: "Jin Wen Gong"

I try to use the following webservice to get the content, but in the json there is no data about "Jin Wen Gong".

https://it.wikipedia.org/w/api.php?action=query&prop=revisions&rvlimit=1&titles=Anni_690_a.C.&rvprop=content&format=json

How do I parse wikipedia correctly ?

mcfly soft
  • 10,143
  • 21
  • 80
  • 162
  • 3
    The webpage doesn't contain anything about Jin Wen Gong either, so I'd say it isn't a parsing problem. – Mureinik Dec 23 '17 at 09:15
  • To be fair, it isn't in the _code_, but it is in the page, as it seems to be in a generated part of the page – Nanne Dec 23 '17 at 09:34
  • Parsing the wikitext should always be the last resort; usually there are [better options](https://stackoverflow.com/questions/33862336/how-to-extract-information-from-a-wikipedia-infobox) available. Since you don't tell what you are trying to do, it's hard to say whether that's true in your case. – Tgr Dec 24 '17 at 04:56

1 Answers1

0

The part you are looking for is not directly in the contents of that page, which you can see if you start editing the page: you will also not see any note of jin wen gong

The part where you see it is generated from this piece of wiki-code:

{{Bio decennio a.C.|Morti|69}}

This code is in the JSON.

On Wikipedia that translates to a list of people (probably people that have died in the mentioned year, if I guess the italian?).

Nanne
  • 61,952
  • 16
  • 112
  • 157
  • Thanks. Do I understand correct. THe Author has not added the text in the correct structure and I am unable to parse that text ? – mcfly soft Dec 23 '17 at 09:36
  • I could not find any reference to how this `bio decennio` thing works, but if you got to the edit page you can see the same code as you see in the `json` -> the thing i pasted above is some sort of reference to another page or part (basically, it seems to me it says something about this decennium (ac 69) and deaths), so it links somewhere? not sure where though. – Nanne Dec 23 '17 at 10:15
  • The template includes [Nati nel 697 a.C.](https://it.wikipedia.org/wiki/Nati_nel_697_a.C.) into the page. (Well, the one you quoted includes [Morti nel 690 a.C.](https://it.wikipedia.org/wiki/Morti_nel_690_a.C.), but that's not where Jin Wen Gong comes from.) – Tgr Dec 24 '17 at 04:51