Skip to contents

This function searches the Google Scholar corpus recursively for the articles that contains your Gene ID of interest.

Usage

searchGSC(
  geneIDs,
  year_start = NULL,
  year_end = NULL,
  max_pages = 2,
  sleep_secs = 10,
  verbose = TRUE,
  translate = NULL
)

Arguments

geneIDs

Character vector of Gene IDs. If you want to use gene symbols, use organism name alongside to avoid articles that might have similar abbreviated word. eg. use "AP2-P AND Plasmodium".

year_start

Limit the results to starting year of interest.

year_end

Limit the results to end year of interest.

max_pages

Maximum number of pages to scrap.

verbose

Print warnings.

translate

Translate the paper titles to english or desired language. Use two letter code.eg: "fr" for french, "en" for english and "es" for spanish.

Value

A data frame, containing 5 columns: GeneID, Title of the article, Year of Publication, Url and Authors.

Details

Warning: Scraping Google Scholar is against their Terms of Service. We advise users to use this function for querying few IDs (not more than 20) per day. Proceeding with this function may result in your IP being blocked temporarily.

Examples

if (FALSE) { # \dontrun{
## We have a fake ID: PF3D7_0420300OR
res <- searchGSC(
geneIDs=c("PF3D7_0420300 OR MAL4P1.192 OR Q8I1N6 OR PFD0985w","PF3D7_0621000","PF3D7_0420300OR"),
translate = "en",
year_start = 2018, 
year_end   = 2021)

test <- searchGSC(geneID = c("AP2-P AND Plasmodium", "AP2-I"))
} # }