0

I am trying to get companies and their filing information from EDGAR using edgarWebR package. Particularly, I want to use two functions from the package - filing_information and company_filings.

I have actually thousands of cik in a different dataset, but both functions above cannot deal with a vector of cik. This is an example -

library(edagrWebR)
comp_file <- company_filings(c("1000045"), before = "20201231",
                            type = "10-K",  count = 100,
                            page = 1)

head(comp_file)
  accession_number act file_number filing_date accepted_date
1             <NA>  34   000-26680  2020-06-22    2020-06-22
2             <NA>  34   000-26680  2019-06-28    2019-06-28
3             <NA>  34   000-26680  2018-06-27    2018-06-27
4             <NA>  34   000-26680  2017-06-14    2017-06-14
5             <NA>  34   000-26680  2016-06-14    2016-06-14
6             <NA>  34   000-26680  2015-06-15    2015-06-15
                                                                                               href
1 https://www.sec.gov/Archives/edgar/data/1000045/000156459020030033/0001564590-20-030033-index.htm
2 https://www.sec.gov/Archives/edgar/data/1000045/000156459019023956/0001564590-19-023956-index.htm
3 https://www.sec.gov/Archives/edgar/data/1000045/000119312518205637/0001193125-18-205637-index.htm
4 https://www.sec.gov/Archives/edgar/data/1000045/000119312517203193/0001193125-17-203193-index.htm
5 https://www.sec.gov/Archives/edgar/data/1000045/000119312516620952/0001193125-16-620952-index.htm
6 https://www.sec.gov/Archives/edgar/data/1000045/000119312515223218/0001193125-15-223218-index.htm
  type film_number
1 10-K    20977409
2 10-K    19927449
3 10-K    18921743
4 10-K    17910577
5 10-K   161712394
6 10-K    15931101
                                               form_name
1 Annual report [Section 13 and 15(d), not S-K Item 405]
2 Annual report [Section 13 and 15(d), not S-K Item 405]
3 Annual report [Section 13 and 15(d), not S-K Item 405]
4 Annual report [Section 13 and 15(d), not S-K Item 405]
5 Annual report [Section 13 and 15(d), not S-K Item 405]
6 Annual report [Section 13 and 15(d), not S-K Item 405]
  description  size
1        <NA> 14 MB
2        <NA> 10 MB
3        <NA>  5 MB
4        <NA>  5 MB
5        <NA>  5 MB
6        <NA>  7 MB

I need to use the href variable in filing_information function.

Actually, I tried to use it this way -

file_info <- filing_information(comp_file$href) 

but it does not work. I got this message -


Error in parse_url(url) : length(url) == 1 is not TRUE

I can actually do it by putting each href variable value like the following way

x <- "https://www.sec.gov/Archives/edgar/data/1000045/000156459020030033/0001564590-20-030033-index.htm"

file_info <- filing_information(x)

The same is true for company_filings function, where I use only one cik - "1000045", but in another file I have thousands of cik for all of which I want to run the company_filings function. Manually it is not possible as I have thousands of cik.

Anybody has any idea how I can perform these two functions on a LARGE vector automatically.

Thanks

Sharif
  • 91
  • 1
  • 7

1 Answers1

0

In general, when a function (whether API-reaching or local) takes only one element as an argument, often the simplest way to "vectorize" it is to use a form of lapply:

companies <- c("1000045", "1000046", "1000047")
comp_file_list <- lapply(
  setNames(nm=companies),
  function(comp) company_filings(comp, before = "20201231",
                                 type = "10-K",  count = 100,
                                 page = 1)
)

Technically, the setNames(nm=.) portion is a safeguard, allowing us to know which company id was use for each element. If it is included in the return data, then you can remove it.

Assuming that the return value is always a data.frame, then you can either keep them in the list (and deal with them as a list of frames, c.f., https://stackoverflow.com/a/24376207/3358227), or you can combine them into one much-taller frame using one of:

# base R
comp_files <- Map(function(x, nm) transform(x, id = nm), comp_files, names(comp_files))
comp_files <- do.call(rbind, comp_files_list)

# dplyr/tidyverse
comp_files <- dplyr::bind_rows(comp_files_list, .id = "id")

# data.table
comp_files <- data.table::rbindlist(comp_files, idcol = "id")

FYI, the second argument of lapply is a function, where the first argument is filled with each from X (first arg of lapply). Sometimes this function can be just the function itself, as in

res <- lapply(companies, company_filings)

This is equivalent to

res <- lapply(companies, function(z) company_filings(z))

If you have a single set of arguments that must be applied to all calls, you can choose one of the following equivalent expressions:

res <- lapply(companies, company_filings, before = "20201231", type = "10-K",  count = 100, page = 1)
res <- lapply(companies, function(z) company_filings(z, before = "20201231", type = "10-K",  count = 100, page = 1))

If one (or more) of those arguments varies with each company, however, you need a different form. Let's assume that we have different before= arguments for each company,

befores <- c("20201231", "20201130", "20201031")
res <- Map(function(comp, bef) company_filing(comp, before=bef, type="10-K"),
           companies, befores)

Basic error handling if you have ids/refs that fail the query:

res <- lapply(comp, function(cmp) {
  tryCatch(
    company_filing(cmp, before=".."),
    error = function(e) e
  )
})
errors <- sapply(res, inherits, "error")
failures <- res[errors]
successes <- res[!errors]
good_returns <- do.call(rbind, success)

names(failures)
# indicates which company ids failed, and the text of the error may
# indicate why they failed

Some options for the tryCatch(..., error=) argument:

  • error=identity returns the raw error, sometimes enough information
  • error=function(e) e same thing
  • error=function(e) conditionMessage(e) is a character return, the message portion of the error
  • error=function(e) NULL ignore the error, return NULL (or some constant) instead

You can also conditionally treat e, including patterns such as if (grepl("not found", e)) {...} else NULL.

r2evans
  • 77,184
  • 4
  • 55
  • 96
  • 1
    Thank you very much. – Sharif Feb 25 '21 at 17:53
  • when I try to run the code with my actual data, I got the following errors - `Error in strsplit(URL, "") : non-character argument`. I actually run the following code - `comp_file_list – Sharif Feb 25 '21 at 21:35
  • That can happen if your `companies` is not `character`. If your `companies` is a vector, then try `lapply(setNames(nm=as.character(companies)),...)`. If `companies` is not a vector, then ... that's not right :-) – r2evans Feb 25 '21 at 21:43
  • actually, i change the code following your last comment, but it shows this - `Error in curl::curl_fetch_memory(url, handle = handle) : necessary data rewind wasn't possible`, I actually run this - `comp_file_list 1 730052 2 1750 3 313368 4 910627 5 702511 6 61478` – Sharif Feb 25 '21 at 21:48
  • The curl error is likely unrelated to this question. A quick search suggests it's either (a) a server hiccup, (b) a space in the filename, or (c) some other malformed URL. Or something else. Sorry, I'm not a curl guru for that. I suggest you wrap `company_filings` with `try(...,silent=TRUE)` to help determine which URL is causing the error. If it's the same URL, then you'll know where to dig deeper. If you get different results on different runs with the same `companies`, that suggests a network or library problem (unlikely to be R). – r2evans Feb 25 '21 at 21:51
  • can you shed some light how can i wrap to find those `cik` which has issues? I am actually new to `R`. Thanks for all your effort. Actually I can find `cik` by running above code - which shows it does not find this `cik` like `100045`. then i can filter it out. but it will take huge time for me. – Sharif Feb 25 '21 at 22:24
  • See my edit. For more info on use of `try` and `tryCatch`, there are many tutorials and walk-throughs on the web for advanced R error handling, including http://adv-r.had.co.nz/Exceptions-Debugging.html. Hope this helps! – r2evans Feb 26 '21 at 00:58