I am wanting to fill in a web form and submit my query and download the resulting data. Some of the fields have the option of a drop-down menu or typing in a search query, sections can also be left blank (if all sections are left blank the entire database is downloaded), hitting the "search and download" button should instigate the downloading of a file.

Here is what I have tried (selecting all records for species "Salmo salar") based on this question. I used my browser (Opera) "Developer Tools" to inspect page elements and identify the names of all the possible fields:


url <- "https://nzffdms.niwa.co.nz/search"

fd <- list(
  search_catchment_no_name = "",
  search_river_lake = "",
  search_sampling_locality = "",
  search_fishing_method = "",
  search_start_year = "",
  search_end_year = "",
  search_species  = "Salmo salar", # species of interest
  search_download_format = 1,      # select csv file format
  submit = "Search and Download"

POST(url, body = fd, encode = "form")

I had hoped this would result in a csv file being downloaded (all records for species "Salmo salar"), but no file downloads (but outputs this (list of 10, just showing the first bit):

Response [https://nzffdms.niwa.co.nz/search]
Date: 2019-10-02 23:35
Status: 200
Content-Type: text/html; charset=utf-8
Size: 19.1 kB
<!DOCTYPE html>  
  <meta http-equiv="Content-Type" content="text/html; c...
    <meta name="title" content="NZ Freshwater Fish Database...
<meta name="description" content="NIWA NZ Freshwater Fish...
<meta name="keywords" content="NIWA, NZ, Freshwater Fish" />
<meta name="language" content="en" />
<meta name="robots" content="index, follow />



I think the issue is with how I am calling the Search and download button, when inspecting the web-page most fields look like this:

# end year field
<input maxlength="4" class="form-control" type="text" name="search[end_year]" id="search_end_year">

But the search and download button elements don't have a name or id option:

<input type="submit" value="Search and Download" class="btn btn-primary btn-md">

Also I have just noticed there is a hidden field, maybe I need to define this?

<input type="hidden" name="search[_csrf_token]" value="d1530f09c1ce8110b5163bd100cb0d67" id="search__csrf_token">

Any advice on how I can get the file downloading would be much appreciated.

First, check robots.txt on the website. It is commented out as of Oct 3, 2019.

Then read the terms and conditions on https://nzffdms.niwa.co.nz/terms and https://www.niwa.co.nz/freshwater-and-estuaries/nzffd/user-guide/tips and make sure you obey the terms and conditions.

And it is also important to throttle the request below.

After checking all the terms and conditions, you can use the code below to query for your data:


gr <- GET("https://nzffdms.niwa.co.nz/search")
doc <- read_html(content(gr, "text"))     #doc <- read_html(gr) #this works as well
getTbl <- function(x) {
    do.call(rbind, lapply(xml_find_all(doc, paste0(".//select[@name='search",x,"']/option")),
        function(n) data.frame(NAME=xml_text(n), VALUE=xml_attr(n, "value"))))
fishing_method <- getTbl("[fishing_method]")
species <- getTbl("[species][]")
csrf_token <- xml_attr(xml_find_all(doc, ".//input[@name='search[_csrf_token]']"), "value")

fd <- list(
r <- POST("https://nzffdms.niwa.co.nz/doSearch", body=fd, encode="form")
read.csv(text=content(r, "text", encoding="UTF-8"))


   card m    y catchname  catch        locality time  org map    east   north altitude penet fishmeth effort pass spcode abund number minl maxl  nzreach
1  3964 1 1981   Waiau R 797.49       Lake Gunn   NA niwa d41 2122400 5581200      477   225      ang     NA   NA salsal    NA     NA   NA   NA 15006671
2  3965 1 1981   Waiau R 797.49     Lake Fergus   NA niwa d41 2123700 5584400      483   229      ang     NA   NA salsal    NA     NA   NA   NA 15006092
3 15975 1 2003   Waiau R 797.40 Excelsior Creek 1330 niwa d44 2095800 5495800      190    94      efp     80    1 salsal    NA      2  102  105 15030686
4 50772 1 1940   Waiau R 797.49 Upukerora River   NA  unk d43 2098500 5519900      210   146      unk     NA   NA salsal    NA     NA   NA   NA 15020897
