7

I use Python 3 (I also have Python 2 installed) and I want to extract countries or cities from a short text. For example, text = "I live in Spain" or text = "United States (New York), United Kingdom (London)".

The answer for countries:

  1. Spain
  2. [United States, United Kingdom]

I tried to install geography but I am unable to run pip install geography. I get this error:

Collecting geography Could not find a version that satisfies the requirement geography (from versions: ) No matching distribution found for geography

It looks like geography only works with Python 2.

I also have geopandas, but I don't know how to extract the required info from text using geopandas.

joris
  • 106,362
  • 32
  • 216
  • 184
Markus
  • 2,594
  • 6
  • 33
  • 67
  • @smci The package is called `geograpy`, not `geography`. – MaxiMouse Apr 20 '20 at 17:24
  • @MaxiMouse: ok, then should this be closed as typo? Also, you could add that as answer. – smci Apr 20 '20 at 23:38
  • @smci Yes, it should probably be closed as a typo. I don't think this could be an answer. – MaxiMouse Apr 21 '20 at 07:50
  • @MaxiMouse: on reflection, the question asks the broader *"How to extract countries from a text?"*, isn't strictly tied to any package, and has good answers, so we should let it stand. – smci Apr 21 '20 at 08:20

2 Answers2

15

you could use pycountry for your task (it also works with python 3):

pip install pycountry

import pycountry
text = "United States (New York), United Kingdom (London)"
for country in pycountry.countries:
    if country.name in text:
        print(country.name)
TerryA
  • 52,957
  • 10
  • 101
  • 125
matyas
  • 2,251
  • 19
  • 26
3

There is a newer version for this library that supports python3 named geograpy3

pip install geograpy3

It allows you to extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.

Example:

import geograpy
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')
url = 'http://www.bbc.com/news/world-europe-26919928'
places = geograpy.get_place_context(url=url)

You can find more details under this link:

Jendoubi Zaid
  • 119
  • 1
  • 4
  • I've seen this exact text many times "Geograpy allows you to extract place names from a URL or text", but all websites / forums / github project examples show only how to use Geograpy with url and I haven't come across an example with a regular string (neither does it work if we just replace the url in the example code with a regular text) – Mihaela May 13 '21 at 20:18