3

I am trying to get the data file names from NCBI or PubMed that are related or attached to hundreds of unique DOIs or PMIDs, in R language. For example. I have PMID: 19122651 and, I want to get the names of the three GSEs connected to it, which are: GSE12781,GSE12782, and GSE12783. I have searched various sources and packages to no avail.
Appreciate your assistance.

Shawn
  • 111
  • 3
  • 9

2 Answers2

5

You can do this using the rentrez package.

The required function is entrez_link.

Example:

library(rentrez)

results <- entrez_link(dbfrom = 'pubmed', id = 19122651, db = 'gds')

results$links$pubmed_gds
[1] "200012783" "200012782" "200012781"

The 3 results are the IDs for the associated GEO Dataset records. You can convert them to GSE accessions using entrez_summary.

Here's a somewhat ugly sapply that may serve as the basis for a function:

sapply(results$links$pubmed_gds, function (id) entrez_summary("gds", id)$accession, 
       USE.NAMES = FALSE)

[1] "GSE12783" "GSE12782" "GSE12781"
neilfws
  • 26,280
  • 5
  • 44
  • 53
  • This is terrific! Thank you very much. I have been racking my brain and scouring the internet and none was this simple or straight forward! Appreciate your time and assistance very much. – Shawn Mar 28 '19 at 03:01
  • 2
    No problem. `rentrez` is a great package, well-worth getting to know. Please accept the answer if it solved the issue. – neilfws Mar 28 '19 at 03:05
  • @neilfws: you have a typo in the last line of code with `sapply`? it gives an error with `result$links$pubmed_gds` and probably was meant to be `results$links$pubmed_gds` ? – Oka Mar 28 '19 at 11:12
1

You can query NCBI via rentrez package as described here. Function entrez_link() should be able to find cross-references

Oka
  • 1,200
  • 4
  • 11