4

I'm trying to set up Nominatim database for address geocoding. Database would be used by komoot's Photon, but I guess that's not so important info.

The problem is that the osm xml/pbf files I have contain not just the addresses, but the whole bunch of other things like bars, various offices and so on, which I'm trying to remove.

The idea is to go with something like this 'till I get the desired result set:

osmosis  --read-xml us-northeast-latest.osm.bz2 \
    --tf reject-nodes landuse=* \
    --tf reject-nodes amenity=* \
    --tf reject-nodes office=*  \
    --tf reject-nodes shop=* \
    --tf reject-nodes place=house  \
    --write-xml output.osm

However, after importing the resulting file, I still get those nodes (which should have been excluded) in the search results:

{
    properties: {
        osm_key: "office",
        osm_value: "ngo",
        extent: [
            -73.9494926,
            40.6998938,
            -73.9482012,
            40.6994192
        ],
        street: "Flushing Avenue",
        name: "Public Lab NYC",
        state: "New York",
        osm_id: 250328718,
        osm_type: "W",
        housenumber: "630",
        postcode: "11206",
        city: "New York City",
        country: "United States of America"
    },
    type: "Feature",
    geometry: {
        type: "Point",
        coordinates: [
            -73.9490215989286,
            40.699639649999995
        ]
    }
}

Note the osm_key and value.

I'm unsure what I'm doing wrong here. Any help would be appreciated.

Igor
  • 289
  • 2
  • 10

1 Answers1

4

I don't think you are familiar enough with OSM's elements and tags yet.

Dropping nodes (or ways or relations) that contain specific tags is definitely not what you want. Instead you want to either drop specific tags or keep only specific tags and drop everything else – instead of dropping complete objects.

For understanding the difference between those two you have to know that addresses in OSM are modeled in two different ways. Either they are modeled by a separate address node or they are attached to an already existing feature such as a building, a shop, a restaurant etc. The second way is the important part here where your approach would drop all of these addresses.

Therefore you want to keep elements even if they are "just" a shop or a restaurant because they can still contain an address. But you are free to drop all non-address tags from these elements and to drop all elements that don't contain any address tags at all. This should be possible with osmosis however I'm not familiar enough with osmosis to provide you the required parameters.

Yet I'm not sure if this is really a good idea because more than one object can share the same name. Imagine a river, a mountain peak, a small village and a large village all sharing the same name. If you decide to drop all additional tags that are required for distinguishing a river from a peak and a small village from a large one then you will run into trouble when trying to decide which name to choose from the list of search results.

scai
  • 17,888
  • 3
  • 49
  • 66
  • So far the best workaround seems to be to import everything, and then add the "osm_tag=!office" (and everything else I want to be removed) to the Photon's query url. It's not removing – Igor Jun 01 '15 at 08:18
  • It's not removing the data in any way (which is probably how at the end it should be solved), but it does the trick. – Igor Jun 01 '15 at 08:29