
Introduction to plasmoRUtils
Rohit Satyam
King Abdullah University of Science & Technology, Saudi ArabiaAlberto Maillo
King Abdullah University of Science & Technology, Saudi ArabiaDavid Gomez-Cabrero
King Abdullah University of Science & Technology, Saudi ArabiaArnab Pain
King Abdullah University of Science & Technology, Saudi Arabia13 June, 2025
Introduction_to_plasmoRUtils.Rmd
Abstract
The package plasmoRUtils
is designed to enable
users to access various Plasmodium and
Apicomplexan-related databases through single-line R functions. It
also provides convenience functions for rapid analysis.
Installation
Before downloading the package, install the following dependencies.
cranpkgs <- c('BiocManager','randomcoloR', 'janitor', 'readr', 'rlang', 'dplyr', 'ggsci', 'rvest', 'easyPubMed', 'plyr', 'scales', 'ggplot2', 'glue', 'tidyr', 'tibble', 'data.table', 'plotly', 'purrr', 'stringr', 'S4Vectors', 'echarts4r', 'magrittr', 'bio3d', 'httr', 'jsonlite', 'ggpubr', 'gt', 'mgsub', 'reshape2','pathfindR')
install.packages(setdiff(cranpkgs, rownames(installed.packages())), dependencies = TRUE)
biocpkgs <- c("rmarkdown","pRoloc","knitr","BiocStyle","DESeq2","styler","utils","IRanges","BiocGenerics","rtracklayer","scuttle","txdbmaker","topGO","drawProteins","GenomicFeatures","biomaRt","AnnotationForge","Biostrings","GenomeInfoDb","SingleCellExperiment","SingleR","NOISeq","GenomicRanges","BSgenome")
BiocManager::install(setdiff(biocpkgs, rownames(installed.packages())), dependencies = TRUE)
The plasmoRUtils package is available on CRAN and can be installed as follows:
install.packages("plasmoRUtils")
# Once installed load the library as
library(plasmoRUtils)
## To re-check if all the dependencies that are required by plasmoRUtils are installed
install_dependencies()
Introduction
Using plasmoRUtils, users can fetch data from VEuPathDB and its 12 component sites databases (VEuPathDBs) and transform it into formats compatible with other R packages in a straightforward manner. Data tables (both preconfigured and user-configured) can be downloaded from VEuPathDBs directly within R/RStudio, thanks to a variety of R functions and the RESTful API provided by VEuPathDBs.
For databases that lack APIs, we developed database-specific “searchX” functions (where X represents the database) that utilize the rvest package for web crawling to retrieve data, which is then transformed into tables that can be saved and shared. Additionally, we created a function to enable programmatic access to the MPMP database for the first time, allowing users to download and share data tables at their convenience. The package also provides several other data sets that we reanalyzed using the latest annotations from VEuPathDBs that can be used by various functions.
Databases covered includes:
- HitPredict
- ApicoTFDB
- Malaria.tools
- Malaria Parasite Metabolic Pathways (MPMP) database
- Malaria Important Interacting Proteins (MIIP)
- Phenoplasm
- PlasmoBase
- Uniprot
- Malaria Cell Atlas, etc. For exhaustive list, see subsections below.
# Load package and some other useful packages by using
suppressPackageStartupMessages(
suppressWarnings({
library(plasmoRUtils)
library(dplyr)
library(plyr)}))
Accessing databases with plasmoRUtils search functions
plasmoRUtils package have several search function to fetch information from databases. The functions are tabulated below:
Function | Database Access |
---|---|
searchApicoTFdb() |
ApicoTFdb |
searchGSC() |
Google Scholar |
searchHP() |
Hit Predict |
searchIpDb() |
InParanoiDb |
searchKipho() |
KiPho Database |
searchMT() |
Malaria Tools |
searchMidb() |
Minor Intron Database |
searchMiip() |
Malaria Important Interacting Proteins |
searchPM() |
PubMed |
searchPhPl() |
PhenoPlasm |
searchTedConsensus() |
The Encyclopedia of Domains |
searchApidoTFdb()
This function helps user fetch the all the transcription factors for a particular apicomplexan of interest from ApicoTFDb(Sardar et al. 2019). For ease of usage the organism names have been abbreviated as follows in the Table below:
Category | Abbreviation | Species |
---|---|---|
Plasmodium Species | pb | Plasmodium berghii |
pv | Plasmodium vivax | |
pf | Plasmodium falciparum | |
pk | Plasmodium knowlesi | |
py | Plasmodium yoelii | |
pc | Plasmodium chabaudi | |
Other Apicomplexan | tg49 | Toxoplasma Gondii ME49 |
tg89 | Toxoplasma Gondii P89 | |
cp | Cryptosporidium parvum | |
em | Eimeria maxima | |
bb | Babesia bovis | |
et | Eimeria tenella | |
nu | Neospora caninum | |
cy | Cyclospora cayetanensis |
Using the function is relatively easy and can be achieved as
## Searching all plasmodium TFs
searchApicoTFdb(org="pf") %>% head()
#> # A tibble: 6 × 4
#> `Gene ID` `Protein Length` `Product Description` `TF- Family`
#> <chr> <chr> <chr> <chr>
#> 1 PF3D7_1319600 1633 ACDC domain-containing protein, p… AP2
#> 2 PF3D7_0604100 1979 AP2 domain transcription factor AP2
#> 3 PF3D7_1222400 2558 AP2 domain transcription factor AP2
#> 4 PF3D7_1222600 2432 AP2 domain transcription factor A… AP2
#> 5 PF3D7_1408200 1702 AP2 domain transcription factor A… AP2
#> 6 PF3D7_1007700 1597 AP2 domain transcription factor A… AP2
## Searching all cyclospora TFs
searchApicoTFdb(org="tg49") %>% head()
#> # A tibble: 6 × 4
#> `Gene ID` `Product Description` `Protein Length` `TF- Family`
#> <chr> <chr> <chr> <chr>
#> 1 TGME49_200385 Myb family DNA-binding domain-con… 2258 Myb/SANT
#> 2 TGME49_201220 zinc finger protein 603 BBOX
#> 3 TGME49_201790 FHA domain-containing protein 556 FHA
#> 4 TGME49_202690 DNA-directed RNA polymerase II RP… 250 General-TF
#> 5 TGME49_202840 FHA domain-containing protein 1044 FHA
#> 6 TGME49_202900 zinc finger (CCCH type) motif-con… 1298 Zn-Finger
searchGSC()
Sometimes, it is difficult to keep track of the corpus while you are
working on your gene of interest and you might want to keep up with your
competing groups across the globe. searchGSC()
function can
help you collect all the necessary literature where your gene ID of
interest has been mentioned and return the results in form of a data
frame.
Since Google Scholar searches are not restricted to the Article abstracts but extends till supplementary section, this function can be very helpful to capture articles that mentions your gene ID of interest and are otherwise missed by normal Google search. Besides, since most of the pre-print literature is indexed at Google Scholar, you can also find papers by your competing groups that are yet to be peer-reviewed.
Note: We would like to warn users that this function is experimental and have been seen to get your IP blocked temporarily for 24 hrs if used more than 20 times. For large array of genes, we encourage users to use more specialized APIs.
## Searching all plasmodium TFs
searchGSC(c("PF3D7_0420300", "PF3D7_0621000"))
#> # A tibble: 14 × 5
#> GeneID Title Year Url Authors
#> <chr> <chr> <chr> <chr> <chr>
#> 1 PF3D7_0420300 Changes in genome organization of parasite… 2018 http… EM Bun…
#> 2 PF3D7_0420300 Transcriptomics and proteomics reveal two … 2019 http… SE Lin…
#> 3 PF3D7_0420300 The RNA structurome in the asexual blood s… 2021 http… DR Alv…
#> 4 PF3D7_0420300 The Transcription Factor PfAP2-O Influence… 2021 http… EFG Cu…
#> 5 PF3D7_0621000 The roles of plasmepsins IX and X in malar… 2021 http… AS Nas…
#> 6 PF3D7_0420300 Investigation of Plasmodium falciparum mit… 2022 http… S Dass
#> 7 PF3D7_0420300 Plasmodium falciparum MORC protein modulat… 2023 http… MK Sin…
#> 8 PF3D7_0621000 Coordination of apicoplast transcription i… 2023 http… Y Koba…
#> 9 PF3D7_0621000 An Insight to Further Malaria Vaccine Deve… 2023 NA A Berry
#> 10 PF3D7_0420300 Systematic in vitro evolution in Plasmodiu… 2024 http… MR Lut…
#> 11 PF3D7_0420300 A Plasmodium falciparum MORC protein compl… 2024 http… MK Sin…
#> 12 PF3D7_0420300 Transcriptome analysis reveals a de novo D… 2025 http… A Okaf…
#> 13 PF3D7_0420300 Genome-wide gene expression profiles throu… 2025 http… G Zang…
#> 14 PF3D7_0621000 Advancing Functional Genomics in P. falcip… 2025 http… ST Win…
searchHP()
This function enables you to search HitPredict(López, Nakai, and Patil
2015) database and procure high-confidence Protein-Protein
interactions(PPI) for your organism of interest. All it requires is a
gene ID and taxon ID. HitPredict database provides PPI data in form of
Uniprot IDs which are not always ideal for apicomplexan biologists.
Therefore, we provide functionality to convert these Uniprot IDs back to
gene IDs by setting uniprotToGID=TRUE
. Since the only
apicomplexan in HitPredict is
Plasmodium falciparum
this gene ID mapping
conversion functionality is only limited for Plasmodium. It should be
turned off, when using it for non-apicomplexan organism as shown
below.
## Single gene query
searchHP("PF3D7_0418300") %>% head()
#> Interactor Interaction Name Experiments Category Method.Score
#> 1 A0A5K1K7X4 46953 A0A5K1K7X4 1 High-throughput 0.35
#> 2 C0H4E0 82953 C0H4E0 1 High-throughput 0.39
#> 3 C0H4U4 83025 C0H4U4 1 High-throughput 0.39
#> 4 C0H586 83100 C0H586 1 High-throughput 0.39
#> 5 C0H5G3 83162 C0H5G3 1 High-throughput 0.39
#> 6 Q8I398 1211124 Q8I398 1 High-throughput 0.49
#> Annotation.Score Interaction.Score Confidence QueryID ensembl_gene_id
#> 1 0.16 0.238 Low PF3D7_0418300 <NA>
#> 2 0.16 0.251 Low PF3D7_0418300 PF3D7_0515400
#> 3 0.16 0.251 Low PF3D7_0418300 PF3D7_0813300
#> 4 0.16 0.251 Low PF3D7_0418300 PF3D7_0933200
#> 5 0.16 0.251 Low PF3D7_0418300 PF3D7_1341300
#> 6 0.16 0.282 High PF3D7_0418300 PF3D7_0905100
## To use it for other organism, turn off uniprotToGID and provide taxid of the organism
test <- searchHP("BRCA1",taxid = "9606" , uniprotToGID = FALSE)
## Multiple gene query
res <- lapply(c("PF3D7_0418300","PF3D7_1118500"), function(x){searchHP(x,uniprotToGID = FALSE)})%>% plyr::ldply()
res %>% tail()
#> Interaction Interactor Name Experiments Category Method.Score
#> 15 83162 C0H5G3 C0H5G3 1 High-throughput 0.39
#> 16 46953 A0A5K1K7X4 A0A5K1K7X4 1 High-throughput 0.35
#> 17 1211135 Q9U0N1 Q9U0N1 1 High-throughput 0.35
#> 18 1212791 Q8IJG6 Q8IJG6 1 High-throughput 0.49
#> 19 87015 C6KTD2 SET1 1 High-throughput 0.39
#> 20 1211131 Q8I1Q4 Q8I1Q4 1 High-throughput 0.49
#> Annotation.Score Interaction.Score Confidence QueryID
#> 15 0.16 0.251 Low PF3D7_0418300
#> 16 0.16 0.238 Low PF3D7_0418300
#> 17 0.16 0.238 Low PF3D7_0418300
#> 18 0.50 0.494 High PF3D7_1118500
#> 19 0.50 0.439 High PF3D7_1118500
#> 20 0.16 0.282 High PF3D7_1118500
## You can now use toGeneid function which uses PlasmoDB release 68 annotation to
## map the uniprot IDs back to the gene IDs
toGeneid(res$Interactor,from = "uniprot","ensembl") %>% full_join(., res, by = c("UniProt ID(s)" = "Interactor"))
#> # A tibble: 20 × 13
#> `Gene ID` `Product Description` `Gene Name or Symbol` `UniProt ID(s)`
#> <chr> <chr> <chr> <chr>
#> 1 PF3D7_0113000 glutamic acid-rich prote… GARP Q9U0N1
#> 2 PF3D7_0418300 conserved Plasmodium pro… N/A Q8I1Q4
#> 3 PF3D7_0515400 conserved protein, unkno… N/A C0H4E0
#> 4 PF3D7_0526800 conserved Plasmodium pro… N/A Q8I3J7
#> 5 PF3D7_0532100 early transcribed membra… ETRAMP5 A0A5K1K7X4
#> 6 PF3D7_0629700 SET domain protein, puta… SET1 C6KTD2
#> 7 PF3D7_0802000 glutamate dehydrogenase,… GDH3 Q8IAM0
#> 8 PF3D7_0813300 NPL domain-containing pr… N/A C0H4U4
#> 9 PF3D7_0825500 protein KRI1, putative KRI1 Q8IB88
#> 10 PF3D7_0905100 nucleoporin NUP221, puta… NUP221 Q8I398
#> 11 PF3D7_0933200 calcyclin-binding protei… N/A C0H586
#> 12 PF3D7_1023900 chromodomain-helicase-DN… CHD1 Q8IJG6
#> 13 PF3D7_1023900 chromodomain-helicase-DN… CHD1 Q8IJG6
#> 14 PF3D7_1112100 protein kinase, putative N/A Q8IIP2
#> 15 PF3D7_1118500 nucleolar protein 56, pu… NOP56 Q8III3
#> 16 PF3D7_1228600 merozoite surface protei… MSP9 Q8I5D2
#> 17 PF3D7_1302700 ATP-dependent RNA helica… N/A Q8IET8
#> 18 PF3D7_1309400 HORMA domain protein, pu… N/A Q8IEM0
#> 19 PF3D7_1341300 60S ribosomal protein L1… N/A C0H5G3
#> 20 PF3D7_1468100 MORC family protein MORC Q8IKF6
#> # ℹ 9 more variables: Interaction <int>, Name <chr>, Experiments <int>,
#> # Category <chr>, Method.Score <dbl>, Annotation.Score <dbl>,
#> # Interaction.Score <dbl>, Confidence <chr>, QueryID <chr>
Another scenario where users might be interested in setting
uniportToGID=FALSE
might be when they are querying
thousands of IDs. Since ID conversion is carried out using biomaRt,
it might be redundant to convert same Uniprot ID multiple times if it
has multiple interacting partners.
For convenience, we therefore provide another function
toGeneid()
which will quickly converts the Uniprot IDs back
to Ensembl IDs.
searchIpDb()
This function enables you to search InParanoiDB 9 (Persson and Sonnhammer 2023) database and procure high-confidence orthologs for your organism of interest. The input required is a character vector of gene IDs.
gids <- c("PF3D7_0807800", "PF3D7_1023900")
searchIpDb(gids) %>% head()
#> success Q8IAR6
#> success Q8IJG6
#> Group ID Species Protein Gene Name
#> 1 826 Perkinsus marinus C5LD32 Pmar_PMAR029539
#> 2 826 Perkinsus marinus C5L7W9 Pmar_PMAR009653
#> 3 826 Plasmodium falciparum Q8IAR6 PF3D7_0807800
#> 4 1814 Plasmodium falciparum Q8IAR6 PF3D7_0807800
#> 5 1814 Plasmodium vivax A5KAC7 PVX_088150
#> 6 706 Cyclospora cayetanensis A0A1D3D0U4 cyc_02495
#> Bitscore info_outline Inparalog Score info_outline Seed Score info_outline
#> 1 126 1.000 1.0
#> 2 126 0.397 -
#> 3 126 1.000 1.0
#> 4 515 1.000 1
#> 5 515 1.000 1
#> 6 179 1.000 1
#> Description queryid
#> 1 26S Proteasome Non-Atpase Regulatory Subunit, Putative PF3D7_0807800
#> 2 26S Proteasome Non-Atpase Regulatory Subunit, Putative PF3D7_0807800
#> 3 26S Proteasome Regulatory Subunit Rpn10, Putative PF3D7_0807800
#> 4 26S Proteasome Regulatory Subunit Rpn10, Putative PF3D7_0807800
#> 5 26S Proteasome Non-Atpase Regulatory Subunit 4, Putative PF3D7_0807800
#> 6 Ubiquitin Interaction Motif Family Protein PF3D7_0807800
You might see some of the Uniprot ID failing such as
Q2KNU4
and Q2KNU5
and their respective URLs.
These Uniprot IDs are missing from the InParanoiDB 9 database.
searchKipho()
This functions let you fetch the Malaria Parasite Kinome-Phosphatome Resource (KiPho) database (Pandey, Kumar, and Gupta 2017) without leaving R. The organism in KiPho includes (see below):
Abbreviation | Species |
---|---|
pb | Plasmodium berghii |
pv | Plasmodium vivax |
pf | Plasmodium falciparum |
pc | Plasmodium chabaudi |
Beside the organism, user needs to specify type="kinase"
to fetch the Kinome and "type=phosphatase"
to fetch
Phosphatome.
searchKipho(org="pf",type = "kinase")
#> # A tibble: 148 × 7
#> `Gene ID` `Previous ID(s)` `Product Description` `Protein Length`
#> <chr> <chr> <chr> <int>
#> 1 PF3D7_0102600 "PFA0130c MAL1P1.17" serine/threonine protei… 630
#> 2 PF3D7_0103700 "PFA0185w MAL1P1.23" L-seryl-tRNA(Sec) kinas… 535
#> 3 PF3D7_0107600 "PFA0380w\tMAL1P2.04" serine/threonine protei… 1595
#> 4 PF3D7_0110600 "PFA0515w\tMAL1P2.32" phosphatidylinositol-4-… 1710
#> 5 PF3D7_0110900 "PFA0530c\tMAL1P2.35" adenylate kinase-like p… 186
#> 6 PF3D7_0111500 "PFA0555c\tMAL1P2.40" UMP-CMP kinase, putative 371
#> 7 PF3D7_0203100 "PFB0150c\tPF02_0030" protein kinase, putative 2485
#> 8 PF3D7_0211700 "PFB0520w\tPF02_0109" tyrosine kinase-like pr… 1233
#> 9 PF3D7_0213400 "PFB0605w\tPF02_0125" protein kinase 7 (PK7) 343
#> 10 PF3D7_0214600 "PFB0665w\tPF02_0137" serine/threonine protei… 1714
#> # ℹ 138 more rows
#> # ℹ 3 more variables: `Conserved Protein Domain Family(Accession No)` <chr>,
#> # `Conserved Protein Domain Family(Name)` <chr>, `Ortholog Group` <chr>
searchKipho(org="pf",type = "phosphatase")
#> # A tibble: 70 × 7
#> `Gene ID` `Previous ID(s)` `Product Description` `Protein Length`
#> <chr> <chr> <chr> <int>
#> 1 PF3D7_0107200 "PFA0350w\tMAL1P1.64" carbon catabolite repre… 337
#> 2 PF3D7_0107800 "PFA0390w" double-strand break rep… 1233
#> 3 PF3D7_0303200 "PFC0150w" HAD superfamily protein… 1162
#> 4 PF3D7_0305600 "PFC0250c" AP endonuclease (DNA-[a… 617
#> 5 PF3D7_0309000 "PFC0380w" dual specificity protei… 575
#> 6 PF3D7_0310300 "PFC0430w" phosphoglycerate mutase… 1165
#> 7 PF3D7_0314400 "PFC0595c" serine/threonine protei… 308
#> 8 PF3D7_0319200 "PFC0850c" endonuclease/exonucleas… 906
#> 9 PF3D7_0322100 "PFC0980c" RNA triphosphatase (Prt… 591
#> 10 PF3D7_0410300 "PFD0505c\tPFD0510c" protein phosphatase PPM… 906
#> # ℹ 60 more rows
#> # ℹ 3 more variables: `Conserved Protein Domain Family(Accession_No)` <chr>,
#> # `Conserved Protein Domain Family(Name)` <chr>, `Ortholog Group` <chr>
searchMT()
This function enables you to find the Condition Specific and Tissue Specific expression of gene of interest in two organisms: Plasmodium falciparum and Plasmodium berghi.
geneID <- c("PBANKA_0100600", "PBANKA_0102900", "PF3D7_0102900")
res <- searchMT(geneID = geneID)
res
# To get overview of stages your genes of interest are highly expressed in. Commented here as the html plot disrupts the HTML vignette rendering.
# res %>% easyPie()
You can also feed the output of searchMT()
to a
companion function to quickly get a sense of the stages in which your
genes of interests are highly expressed in. Another convenience function
for malaria.tools database is plotAllCondition()
function.
This let you create publication ready plots of TPM normalized expression
values across multiple stages of parasite using bulk-rnaseq data from
malaria.tools. These plots are similar to what you see in the database
itself.
# TPM plot (non-interactive)
plotAllCondition(geneID = "PBANKA_0100600")
plotAllCondition(geneID = "PBANKA_0100600",plotify = TRUE) ## interactive
## To get the data used for making above plot use returnData argument
plotAllCondition(geneID = "PBANKA_0100600",returnData = TRUE) %>% head()
#> condition mean min max group
#> 1 Asexual: SRP099925 460.6603 396.042 551.808 Asexual
#> 2 Asexual, PbSR-MG KO: SRP109709 403.1830 355.452 442.661 Asexual
#> 3 10 hpi, ab libitum: SRP059210 224.5177 206.967 236.912 10
#> 4 10 hpi, diet restriction: SRP059210 228.0705 218.048 238.093 10
#> 5 10 hpi, ab libitum, kin KO: SRP059210 155.0550 155.055 155.055 10
#> 6 10 hpi, diet restriction, kin KO: SRP059210 130.9310 130.931 130.931 10
Users can also plot stage specific average TPMs as well similar to
the plots rendered in malaria.tools using
plotStageSpecific()
function.
plotStageSpecific(geneID = "PBANKA_0100600",plotify = TRUE)
searchMidb()
This function enables you to fetch minor-introns information from MiDB database in bulk. By default, all intron classes are fetched (major-like, major_hybrid, minor-like, minor_hybrid, non-canonical). For more information on minor introns visit MiDB database.
## Let's see what organisms are present in MiDB
data("midbSpecies")
df <- searchMidb("Toxoplasma gondii ME49")
df %>% head()
#> # A tibble: 6 × 43
#> gene_symbol ensembl_gene_id transcript_key intron_name intron_start intron_end
#> <chr> <chr> <chr> <chr> <dbl> <dbl>
#> 1 NULL TGME49_200010 TGME49_200010… Toxoplasma… 2247210 2247553
#> 2 NULL TGME49_200290 TGME49_200290… Toxoplasma… 6776668 6776934
#> 3 NULL TGME49_200295 TGME49_200295… Toxoplasma… 6783452 6783939
#> 4 NULL TGME49_200295 TGME49_200295… Toxoplasma… 6782107 6782402
#> 5 NULL TGME49_200300 TGME49_200300… Toxoplasma… 6786183 6786931
#> 6 NULL TGME49_200320 TGME49_200320… Toxoplasma… 6796988 6797499
#> # ℹ 37 more variables: term_nt <chr>, `5ss_seq` <chr>, `3ss_seq` <chr>,
#> # U2_BPS <chr>, U12_BPS <chr>, `5ss_class` <dbl>, `3ss_class` <chr>,
#> # intron_class <chr>, flanking_aa <chr>, intron_aa_position <chr>,
#> # intron_phase <chr>, intron_rank <dbl>, major_5ss_score <dbl>,
#> # major_5ss_LOD <dbl>, major_5ss_LOD_stdev <dbl>, major_5ss_match <dbl>,
#> # major_5ss_match_stdev <dbl>, minor_5ss_score <dbl>, minor_5ss_LOD <dbl>,
#> # minor_5ss_LOD_stdev <dbl>, minor_5ss_match <dbl>, …
searchMiip()
This function enables you to fetch Protein-protein interaction pairs of Plasmodium falciparum and the respective stage (sexual and asexual) they interact from MIIP database.
searchMiip(c("PF3D7_0807800","PF3D7_1023900"))
#> # A tibble: 4 × 5
#> interactorA descriptionA interactorB descriptionB stage
#> <chr> <chr> <chr> <chr> <chr>
#> 1 PF3D7_0807800 26S proteasome regulatory subuni… PF3D7_0710… conserved P… game…
#> 2 PF3D7_1023900 chromodomain-helicase-DNA-bindin… PF3D7_1014… protein KIC8 game…
#> 3 PF3D7_1023900 chromodomain-helicase-DNA-bindin… PF3D7_1138… protein KIC5 ring
#> 4 PF3D7_1335100 merozoite surface protein 7 PF3D7_1023… chromodomai… schi…
searchPM()
Aside from searchGSC
you can also use
searchPM()
to fetch literature information where your gene
IDs of interest have been mentioned. This will however limit the search
to title abstract and keywords. In the background, it makes use of
easyPubMed()
functions such as get_pubmed_ids
and articles_to_list
and then transforms the output in form
of a table that is easy explore
searchPM(geneID = c("PF3D7_0420300","PF3D7_0621000"))
#> PubMed Query used for PF3D7_0420300 was:
#> "Plasmodium falciparum"[All Fields] AND "PF3D7_0420300"[Title/Abstract:~0] AND 2010/01/01:2025/12/31[Date - Publication]
#> pmid doi
#> 1 39412522 10.7554/eLife.92201
#> 2 30526479 10.1186/s12864-018-5257-x
#> title
#> 1 A Plasmodium falciparum MORC protein complex modulates epigenetic control of gene expression through interaction with heterochromatin.
#> 2 Schizont transcriptome variation among clinical isolates and laboratory-adapted clones of the malaria parasite Plasmodium falciparum.
#> year month day jabbrv journal GeneID
#> 1 2024 10 16 Elife eLife PF3D7_0420300
#> 2 2019 03 18 BMC Genomics BMC genomics PF3D7_0420300
Gene IDs for which no results are available will be shown on the
screen. However, when a query is successful, the function also prints
the exact query that can be used by you for reproducibility purposes.
This behavior can be turned off if you have a lot of gene IDs using
verbose=FALSE
.
"Plasmodium falciparum"[All Fields] AND "PF3D7_0420300"[Title/Abstract:~0] AND 2010/01/01:2025/12/31[Date - Publication]
searchPhPl()
This convenience function allow users to fetch Disruptability and
Mutant Phenotypes tables for gene of interest from PhenoPlasm database.
fetch=1
helps fetch the Disruptability and
fetch=2
helps fetch the Mutant Phenotype table.
searchPhPl(geneID = c("PF3D7_0420300","PF3D7_0621000","PF3D7_0523800"), org="pf") %>% head()
#> Species Disruptability Reference
#> 1 P. falciparum 3D7 Refractory USF piggyBac screen (Insert. mut.)
#> 2 P. falciparum 3D7 Refractory USF piggyBac screen (Insert. mut.)
#> 3 P. falciparum 3D7 Refractory 354041168 ko attempts failed
#> Submitter QueryGID
#> 1 USF PiggyBac Screen PF3D7_0621000
#> 2 USF PiggyBac Screen PF3D7_0523800
#> 3 Theo Sanderson, Francis Crick Institute PF3D7_0523800
searchPhPl(geneID = c("PF3D7_0420300","PF3D7_0621000","PF3D7_0523800"), org="pf", fetch=2) %>% head()
#> # A tibble: 1 × 6
#> Species Stage Phenotype Reference Submitter QueryGID
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 P. falciparum 3D7 Asexual Difference from wild-t… "PMID 39… Paul Sig… PF3D7_0…
Oftentime, you would like to get the summary table like the one
plotted in PhenoPlasm that combines both Disruptability and Mutant
Phenotype information. Rather than using screen grab to get the snapshot
of the table, one can now download the table from Advanced Search
button by submitting the geneIDs of interest and can feed that file to
easyPhplplottbl()
function of plasmoRUtils to
render such table from the phenotype.txt
files directly
# Read the file
df <- read.csv("phenotype.txt", skip = 2, sep = "\t") %>%
dplyr::select(-3, -4) %>% #remove the empty cols: GeneLocalisation and OrthologLocalisation
dplyr::rename_with(~ gsub("Sprozoite", "Sporozoite", .x)) #Correct the colnames
easyPhplplottbl(df)
## Or you can pass the file path directly
easyPhplplottbl("phenotype.txt")
#Load sample data (subset of genes from phenotype.txt file above)
data(pf3d7PhplTable)
easyPhplplottbl(pf3d7PhplTable)
Gene | Asexual | Gametocyte | Liver | Oocyte | Ookinete | Sporozoite | Viability |
---|---|---|---|---|---|---|---|
PF3D7_0105200 | ❌ | ✔ ❌ | |||||
PF3D7_0105300 | ✅ | ✅ | ❗ | ❗ | ✅ | ❗ | ❌ ✔ |
PF3D7_0105400 | ❗ | ✔ | |||||
PF3D7_0217500 | ⟴ 🟥 ✅ | ❗ | ❗ | ❌ ✔ ❌ | |||
PF3D7_1337800 | ❗ 🟥 ❗ | ❌ ❌ |
Windows users might face issues saving these plots as pdf directly in which case, the tables can be saved as HTML files which can then be converted to SVG or PDF formats using various online converters to combine them with other plots.
Note: As per Phenotype taxonomy of Phenoplasm, the database uses “D” for both
Difference from wild-type
andEgress defect
which is confusing and difficult to resolve programmatically. An example of this isPF3D7_1337800
that have “D S D” in the “Gene Asexual”. While we have requested the database maintainer to fix this, please watch out for borderline cases like these.
searchTedConsensus()
This function helps users fetch the domain information from
The Encyclopedia of Domains database given set of
uniprot IDs. Usually these table contains a numeric CATH labels which
are difficult to comprehend and user has to click on them one by one to
find the domain name. We enable conversion of these CATH labels to
description using returnCATHdesc=TRUE
. This will try to
scrap the labels for given CATH label from CATH database wherever
possible.
searchTedConsensus(c("Q7K6A1","Q8IAP8","C0H4D0","C6KT90","Q8IBJ7"), returnCATHdesc=FALSE)
#> ted_id uniprot_acc md5_domain
#> 1 AF-Q7K6A1-F1-model_v4_TED01 Q7K6A1 b99e920f0ded31aa96af0ef9be1338f4
#> 2 AF-C0H4D0-F1-model_v4_TED01 C0H4D0 cd912dcbbb5d070cbb254c0a88278fe4
#> 3 AF-C6KT90-F1-model_v4_TED02 C6KT90 70d20592d9f682bff23dc6188f318244
#> 4 AF-C6KT90-F1-model_v4_TED01 C6KT90 7cc174ebefe723733b6e63508fd23a9e
#> 5 AF-Q8IBJ7-F1-model_v4_TED01 Q8IBJ7 71697d50571d5fe2331a13ff16503478
#> consensus_level chopping nres_domain num_segments plddt
#> 1 high 6-376 371 1 97.1740
#> 2 medium 55-153 99 1 88.9028
#> 3 medium 322-382 61 1 45.3118
#> 4 medium 172-203 32 1 48.8553
#> 5 medium 54-88 35 1 87.3500
#> num_helix_strand_turn num_helix num_strand num_helix_strand num_turn
#> 1 60 16 8 24 35
#> 2 15 5 4 9 6
#> 3 3 3 0 3 0
#> 4 2 1 0 1 1
#> 5 5 0 3 3 2
#> proteome_id cath_label cath_assignment_level cath_assignment_method
#> 1 36329 3.40.800.20 H foldseek
#> 2 36329 3.30.70.2380 H foldseek
#> 3 36329 4.10.860 T foldclass
#> 4 36329 1.20.5 T foldclass
#> 5 36329 - - -
#> packing_density norm_rg tax_common_name tax_scientific_name
#> 1 13.064 0.298 Plasmodium falciparum (isolate 3D7)
#> 2 12.537 0.306 Plasmodium falciparum (isolate 3D7)
#> 3 9.900 0.374 Plasmodium falciparum (isolate 3D7)
#> 4 8.900 0.403 Plasmodium falciparum (isolate 3D7)
#> 5 9.833 0.370 Plasmodium falciparum (isolate 3D7)
#> tax_lineage
#> 1 cellular organisms, Eukaryota, Sar, Alveolata, Apicomplexa, Aconoidasida, Haemosporida, Plasmodiidae, Plasmodium, Plasmodium (Laverania), Plasmodium falciparum
#> 2 cellular organisms, Eukaryota, Sar, Alveolata, Apicomplexa, Aconoidasida, Haemosporida, Plasmodiidae, Plasmodium, Plasmodium (Laverania), Plasmodium falciparum
#> 3 cellular organisms, Eukaryota, Sar, Alveolata, Apicomplexa, Aconoidasida, Haemosporida, Plasmodiidae, Plasmodium, Plasmodium (Laverania), Plasmodium falciparum
#> 4 cellular organisms, Eukaryota, Sar, Alveolata, Apicomplexa, Aconoidasida, Haemosporida, Plasmodiidae, Plasmodium, Plasmodium (Laverania), Plasmodium falciparum
#> 5 cellular organisms, Eukaryota, Sar, Alveolata, Apicomplexa, Aconoidasida, Haemosporida, Plasmodiidae, Plasmodium, Plasmodium (Laverania), Plasmodium falciparum
searchTedConsensus(c("Q7K6A1","Q8IAP8","C0H4D0","C6KT90","Q8IBJ7"), returnCATHdesc=TRUE)
#> ted_id uniprot_acc md5_domain
#> 1 AF-Q7K6A1-F1-model_v4_TED01 Q7K6A1 b99e920f0ded31aa96af0ef9be1338f4
#> 2 AF-C0H4D0-F1-model_v4_TED01 C0H4D0 cd912dcbbb5d070cbb254c0a88278fe4
#> 3 AF-C6KT90-F1-model_v4_TED02 C6KT90 70d20592d9f682bff23dc6188f318244
#> 4 AF-C6KT90-F1-model_v4_TED01 C6KT90 7cc174ebefe723733b6e63508fd23a9e
#> 5 AF-Q8IBJ7-F1-model_v4_TED01 Q8IBJ7 71697d50571d5fe2331a13ff16503478
#> consensus_level chopping nres_domain num_segments plddt
#> 1 high 6-376 371 1 97.1740
#> 2 medium 55-153 99 1 88.9028
#> 3 medium 322-382 61 1 45.3118
#> 4 medium 172-203 32 1 48.8553
#> 5 medium 54-88 35 1 87.3500
#> num_helix_strand_turn num_helix num_strand num_helix_strand num_turn
#> 1 60 16 8 24 35
#> 2 15 5 4 9 6
#> 3 3 3 0 3 0
#> 4 2 1 0 1 1
#> 5 5 0 3 3 2
#> proteome_id cath_label cath_assignment_level cath_assignment_method
#> 1 36329 3.40.800.20 H foldseek
#> 2 36329 3.30.70.2380 H foldseek
#> 3 36329 4.10.860 T foldclass
#> 4 36329 1.20.5 T foldclass
#> 5 36329 - - -
#> packing_density norm_rg tax_common_name tax_scientific_name
#> 1 13.064 0.298 Plasmodium falciparum (isolate 3D7)
#> 2 12.537 0.306 Plasmodium falciparum (isolate 3D7)
#> 3 9.900 0.374 Plasmodium falciparum (isolate 3D7)
#> 4 8.900 0.403 Plasmodium falciparum (isolate 3D7)
#> 5 9.833 0.370 Plasmodium falciparum (isolate 3D7)
#> tax_lineage
#> 1 cellular organisms, Eukaryota, Sar, Alveolata, Apicomplexa, Aconoidasida, Haemosporida, Plasmodiidae, Plasmodium, Plasmodium (Laverania), Plasmodium falciparum
#> 2 cellular organisms, Eukaryota, Sar, Alveolata, Apicomplexa, Aconoidasida, Haemosporida, Plasmodiidae, Plasmodium, Plasmodium (Laverania), Plasmodium falciparum
#> 3 cellular organisms, Eukaryota, Sar, Alveolata, Apicomplexa, Aconoidasida, Haemosporida, Plasmodiidae, Plasmodium, Plasmodium (Laverania), Plasmodium falciparum
#> 4 cellular organisms, Eukaryota, Sar, Alveolata, Apicomplexa, Aconoidasida, Haemosporida, Plasmodiidae, Plasmodium, Plasmodium (Laverania), Plasmodium falciparum
#> 5 cellular organisms, Eukaryota, Sar, Alveolata, Apicomplexa, Aconoidasida, Haemosporida, Plasmodiidae, Plasmodium, Plasmodium (Laverania), Plasmodium falciparum
#> cath_label_desc
#> 1 Histone deacetylase domain
#> 2
#> 3
#> 4
#> 5 NULL
In the example above, C0H4D0
have CATH label 3.30.70.2380
.
But this superfamily doesn’t have a name. Besides, sometimes instead of
Superfamily CATH labels, TED might use CATH-Gene3D Hierarchy. No
description is returned in such cases.
Session Info
utils::sessionInfo()
#> R version 4.4.1 (2024-06-14 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26100)
#>
#> Matrix products: default
#>
#>
#> locale:
#> [1] LC_COLLATE=English_India.utf8 LC_CTYPE=English_India.utf8
#> [3] LC_MONETARY=English_India.utf8 LC_NUMERIC=C
#> [5] LC_TIME=English_India.utf8
#>
#> time zone: Asia/Riyadh
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] plyr_1.8.9 dplyr_1.1.4 plasmoRUtils_1.0.0
#> [4] rlang_1.1.6 readr_2.1.5 janitor_2.2.1
#> [7] randomcoloR_1.1.0.1 BiocStyle_2.32.1
#>
#> loaded via a namespace (and not attached):
#> [1] IRanges_2.38.1 dichromat_2.0-0.1
#> [3] vroom_1.6.5 progress_1.2.3
#> [5] vsn_3.72.0 nnet_7.3-19
#> [7] Biostrings_2.72.1 vctrs_0.6.5
#> [9] digest_0.6.37 png_0.1-8
#> [11] proxy_0.4-27 MSnbase_2.30.1
#> [13] echarts4r_0.4.5.9000 parallelly_1.44.0
#> [15] MASS_7.3-61 pkgdown_2.1.3
#> [17] reshape2_1.4.4 httpuv_1.6.16
#> [19] foreach_1.5.2 BiocGenerics_0.50.0
#> [21] withr_3.0.2 xfun_0.52
#> [23] ggpubr_0.6.0 survival_3.8-3
#> [25] memoise_2.0.1 hexbin_1.28.5
#> [27] ggsci_3.2.0 mixtools_2.0.0.1
#> [29] systemfonts_1.2.3 ragg_1.4.0
#> [31] gtools_3.9.5 easyPubMed_2.13
#> [33] V8_6.0.3 Formula_1.2-5
#> [35] prettyunits_1.2.0 KEGGREST_1.44.1
#> [37] promises_1.3.3 httr_1.4.7
#> [39] rstatix_0.7.2 restfulr_0.0.15
#> [41] globals_0.18.0 ps_1.9.1
#> [43] rstudioapi_0.17.1 UCSC.utils_1.0.0
#> [45] generics_0.1.4 processx_3.8.6
#> [47] curl_6.2.3 ncdf4_1.24
#> [49] S4Vectors_0.42.1 zlibbioc_1.50.0
#> [51] ScaledMatrix_1.12.0 randomForest_4.7-1.2
#> [53] bio3d_2.4-5 GenomeInfoDbData_1.2.12
#> [55] SparseArray_1.4.8 xtable_1.8-4
#> [57] stringr_1.5.1 desc_1.4.3
#> [59] doParallel_1.0.17 evaluate_1.0.3
#> [61] S4Arrays_1.4.1 BiocFileCache_2.12.0
#> [63] preprocessCore_1.66.0 hms_1.1.3
#> [65] GenomicRanges_1.56.2 bookdown_0.43
#> [67] irlba_2.3.5.1 colorspace_2.1-1
#> [69] filelock_1.0.3 magrittr_2.0.3
#> [71] snakecase_0.11.1 later_1.4.2
#> [73] viridis_0.6.5 lattice_0.22-6
#> [75] MsCoreUtils_1.16.1 future.apply_1.11.3
#> [77] SparseM_1.84-2 XML_3.99-0.18
#> [79] scuttle_1.14.0 matrixStats_1.5.0
#> [81] class_7.3-22 pillar_1.10.2
#> [83] nlme_3.1-166 iterators_1.0.14
#> [85] compiler_4.4.1 beachmat_2.20.0
#> [87] stringi_1.8.7 gower_1.0.2
#> [89] SummarizedExperiment_1.34.0 dendextend_1.19.0
#> [91] lubridate_1.9.4 GenomicAlignments_1.40.0
#> [93] drawProteins_1.24.0 crayon_1.5.3
#> [95] abind_1.4-8 BiocIO_1.14.0
#> [97] bit_4.6.0 chromote_0.5.1
#> [99] pcaMethods_1.96.0 codetools_0.2-20
#> [101] textshaping_1.0.1 recipes_1.3.1
#> [103] BiocSingular_1.20.0 MLInterfaces_1.84.0
#> [105] crosstalk_1.2.1 bslib_0.9.0
#> [107] e1071_1.7-16 plotly_4.10.4
#> [109] LaplacesDemon_16.1.6 mime_0.13
#> [111] MultiAssayExperiment_1.30.3 splines_4.4.1
#> [113] Rcpp_1.0.14 dbplyr_2.5.0
#> [115] sparseMatrixStats_1.16.0 knitr_1.50
#> [117] blob_1.2.4 utf8_1.2.5
#> [119] clue_0.3-66 mzR_2.38.0
#> [121] AnnotationFilter_1.28.0 fs_1.6.6
#> [123] QFeatures_1.14.2 listenv_0.9.1
#> [125] mzID_1.42.0 DelayedMatrixStats_1.26.0
#> [127] ggsignif_0.6.4 tibble_3.2.1
#> [129] Matrix_1.7-1 statmod_1.5.0
#> [131] tzdb_0.5.0 lpSolve_5.6.23
#> [133] pkgconfig_2.0.3 tools_4.4.1
#> [135] cachem_1.1.0 RSQLite_2.4.0
#> [137] viridisLite_0.4.2 rvest_1.0.4
#> [139] DBI_1.2.3 impute_1.78.0
#> [141] fastmap_1.2.0 rmarkdown_2.29
#> [143] scales_1.4.0 grid_4.4.1
#> [145] gt_1.0.0 Rsamtools_2.20.0
#> [147] broom_1.0.8 sass_0.4.10
#> [149] coda_0.19-4.1 FNN_1.1.4.1
#> [151] BiocManager_1.30.25 graph_1.82.0
#> [153] carData_3.0-5 selectr_0.4-2
#> [155] SingleR_2.6.0 rpart_4.1.23
#> [157] farver_2.1.2 yaml_2.3.10
#> [159] AnnotationForge_1.46.0 MatrixGenerics_1.16.0
#> [161] rtracklayer_1.64.0 cli_3.6.5
#> [163] purrr_1.0.4 stats4_4.4.1
#> [165] txdbmaker_1.0.1 lifecycle_1.0.4
#> [167] caret_7.0-1 Biobase_2.64.0
#> [169] mvtnorm_1.3-3 lava_1.8.1
#> [171] kernlab_0.9-33 backports_1.5.0
#> [173] BiocParallel_1.38.0 annotate_1.82.0
#> [175] timechange_0.3.0 gtable_0.3.6
#> [177] rjson_0.2.23 parallel_4.4.1
#> [179] pROC_1.18.5 limma_3.60.6
#> [181] jsonlite_2.0.0 bitops_1.0-9
#> [183] ggplot2_3.5.2 bit64_4.6.0-1
#> [185] Rtsne_0.17 pRoloc_1.44.1
#> [187] jquerylib_0.1.4 segmented_2.1-4
#> [189] timeDate_4041.110 lazyeval_0.2.2
#> [191] shiny_1.10.0 htmltools_0.5.8.1
#> [193] affy_1.82.0 GO.db_3.19.1
#> [195] rappdirs_0.3.3 glue_1.8.0
#> [197] httr2_1.1.2 XVector_0.44.0
#> [199] RCurl_1.98-1.17 MALDIquant_1.22.3
#> [201] mclust_6.1.1 gridExtra_2.3
#> [203] igraph_2.1.4 R6_2.6.1
#> [205] tidyr_1.3.1 SingleCellExperiment_1.26.0
#> [207] labeling_0.4.3 GenomicFeatures_1.56.0
#> [209] cluster_2.1.8 GenomeInfoDb_1.40.1
#> [211] ipred_0.9-15 DelayedArray_0.30.1
#> [213] tidyselect_1.2.1 ProtGenerics_1.36.0
#> [215] sampling_2.10 xml2_1.3.8
#> [217] car_3.1-3 AnnotationDbi_1.66.0
#> [219] future_1.49.0 ModelMetrics_1.2.2.2
#> [221] rsvd_1.0.5 affyio_1.74.0
#> [223] topGO_2.56.0 data.table_1.17.4
#> [225] websocket_1.4.4 mgsub_1.7.3
#> [227] htmlwidgets_1.6.4 RColorBrewer_1.1-3
#> [229] biomaRt_2.60.1 hardhat_1.4.1
#> [231] prodlim_2025.04.28 PSMatch_1.8.0