Skip to contents

Abstract

The package plasmoRUtils is designed to enable users to access various Plasmodium and Apicomplexan-related databases through single-line R functions. It also provides convenience functions for rapid analysis.

Installation

Before downloading the package, install the following dependencies.

## Easiest way to install package and it's dependencies

pak::pkg_install("Rohit-Satyam/plasmoRUtils", dependencies = TRUE)

cranpkgs <- c('BiocManager','randomcoloR', 'janitor', 'readr', 'rlang', 'dplyr', 'ggsci', 'rvest', 'easyPubMed', 'plyr', 'scales', 'ggplot2', 'glue', 'tidyr', 'tibble', 'data.table', 'plotly', 'purrr', 'stringr', 'S4Vectors', 'echarts4r', 'magrittr', 'bio3d', 'httr', 'jsonlite', 'ggpubr', 'gt', 'mgsub', 'reshape2','pathfindR')

install.packages(setdiff(cranpkgs, rownames(installed.packages())), dependencies = TRUE)

biocpkgs <- c("rmarkdown","pRoloc","knitr","BiocStyle","DESeq2","styler","utils","IRanges","BiocGenerics","rtracklayer","scuttle","txdbmaker","topGO","drawProteins","GenomicFeatures","biomaRt","AnnotationForge","Biostrings","GenomeInfoDb","SingleCellExperiment","SingleR","NOISeq","GenomicRanges","BSgenome")

BiocManager::install(setdiff(biocpkgs, rownames(installed.packages())), dependencies = TRUE)

The plasmoRUtils package is available on CRAN and can be installed as follows:

install.packages("plasmoRUtils")

# Once installed load the library as
library(plasmoRUtils)

## To re-check if all the dependencies that are required by plasmoRUtils are installed
install_dependencies()

Introduction

Using plasmoRUtils, users can fetch data from VEuPathDB and its 12 component sites databases (VEuPathDBs) and transform it into formats compatible with other R packages in a straightforward manner. Data tables (both preconfigured and user-configured) can be downloaded from VEuPathDBs directly within R/RStudio, thanks to a variety of R functions and the RESTful API provided by VEuPathDBs.

For databases that lack APIs, we developed database-specific “searchX” functions (where X represents the database) that utilize the rvest package for web crawling to retrieve data, which is then transformed into tables that can be saved and shared. Additionally, we created a function to enable programmatic access to the MPMP database for the first time, allowing users to download and share data tables at their convenience. The package also provides several other data sets that we reanalyzed using the latest annotations from VEuPathDBs that can be used by various functions.

Databases covered includes:

  1. HitPredict
  2. ApicoTFDB
  3. Malaria.tools
  4. Malaria Parasite Metabolic Pathways (MPMP) database
  5. Malaria Important Interacting Proteins (MIIP)
  6. Phenoplasm
  7. Uniprot
  8. Malaria Cell Atlas, etc. For exhaustive list, see subsections below.
# Load package and some other useful packages by using
suppressPackageStartupMessages(
  suppressWarnings({
    library(plasmoRUtils)
    library(dplyr)
    library(plyr)}))

Accessing databases with plasmoRUtils search functions

plasmoRUtils package have several search function to fetch information from databases. The functions are tabulated below:

Function Database Access
searchApicoTFdb() ApicoTFdb
searchGSC() Google Scholar
searchHP() Hit Predict
searchIpDb() InParanoiDb
searchKipho() KiPho Database
searchMidb() Minor Intron Database
searchMiip() Malaria Important Interacting Proteins
searchPM() PubMed
searchPhPl() PhenoPlasm
searchTedConsensus() The Encyclopedia of Domains

searchApidoTFdb()

This function helps user fetch the all the transcription factors for a particular apicomplexan of interest from ApicoTFDb(Sardar et al. 2019). For ease of usage the organism names have been abbreviated as follows in the Table below:

Category Abbreviation Species
Plasmodium Species pb Plasmodium berghii
pv Plasmodium vivax
pf Plasmodium falciparum
pk Plasmodium knowlesi
py Plasmodium yoelii
pc Plasmodium chabaudi
Other Apicomplexan tg49 Toxoplasma Gondii ME49
tg89 Toxoplasma Gondii P89
cp Cryptosporidium parvum
em Eimeria maxima
bb Babesia bovis
et Eimeria tenella
nu Neospora caninum
cy Cyclospora cayetanensis

Using the function is relatively easy and can be achieved as

## Searching all plasmodium TFs
searchApicoTFdb(org="pf") %>% head()
#> # A tibble: 6 × 4
#>   `Gene ID`     `Protein Length` `Product Description`              `TF- Family`
#>   <chr>         <chr>            <chr>                              <chr>       
#> 1 PF3D7_1319600 1633             ACDC domain-containing protein, p… AP2         
#> 2 PF3D7_0604100 1979             AP2 domain transcription factor    AP2         
#> 3 PF3D7_1222400 2558             AP2 domain transcription factor    AP2         
#> 4 PF3D7_1222600 2432             AP2 domain transcription factor A… AP2         
#> 5 PF3D7_1408200 1702             AP2 domain transcription factor A… AP2         
#> 6 PF3D7_1007700 1597             AP2 domain transcription factor A… AP2
## Searching all cyclospora TFs
searchApicoTFdb(org="tg49") %>% head()
#> # A tibble: 6 × 4
#>   `Gene ID`     `Product Description`              `Protein Length` `TF- Family`
#>   <chr>         <chr>                              <chr>            <chr>       
#> 1 TGME49_200385 Myb family DNA-binding domain-con… 2258             Myb/SANT    
#> 2 TGME49_201220 zinc finger protein                603              BBOX        
#> 3 TGME49_201790 FHA domain-containing protein      556              FHA         
#> 4 TGME49_202690 DNA-directed RNA polymerase II RP… 250              General-TF  
#> 5 TGME49_202840 FHA domain-containing protein      1044             FHA         
#> 6 TGME49_202900 zinc finger (CCCH type) motif-con… 1298             Zn-Finger

## Fetch all Experimentally validated TRs
searchApicoTFdb(fetch = "exptfs") %>% head()
#> # A tibble: 6 × 7
#>   Gene_ID        Source Author   Year  Product         Pubmed Orthologous_Groups
#>   <chr>          <chr>  <chr>    <chr> <chr>           <chr>  <chr>             
#> 1 cgd2_3490      PBM    DeSilva  2008  AP2/ERF domain… 18541… OG5_147419        
#> 2 NCLIV_058430   PBM    Campbell 2010  unspecified pr… 21060… OG5_241106        
#> 3 NCLIV_059950   PBM    Campbell 2010  unspecified pr… 21060… OG5_241143        
#> 4 PBANKA_0102900 PBM    DeSilva  2008  AP2 domain tra… 18541… OG5_150514        
#> 5 PBANKA_0214400 PBM    Campbell 2010  AP2 domain tra… 21060… OG5_157023        
#> 6 PBANKA_0905900 PBM    Campbell 2010  AP2 domain tra… 21060… OG5_154773

searchGSC()

Sometimes, it is difficult to keep track of the corpus while you are working on your gene of interest and you might want to keep up with your competing groups across the globe. searchGSC() function can help you collect all the necessary literature where your gene ID of interest has been mentioned and return the results in form of a data frame.

Since Google Scholar searches are not restricted to the Article abstracts but extends till supplementary section, this function can be very helpful to capture articles that mentions your gene ID of interest and are otherwise missed by normal Google search. Besides, since most of the pre-print literature is indexed at Google Scholar, you can also find papers by your competing groups that are yet to be peer-reviewed.

Note: We would like to warn users that this function is experimental and have been seen to get your IP blocked temporarily for 24 hrs if used more than 20 times. For large array of genes, we encourage users to use more specialized APIs.

## Fetch all the papers between year 2018 and 2021
searchGSC(
  geneIDs=c("PF3D7_0420300 OR MAL4P1.192 OR Q8I1N6 OR PFD0985w","PF3D7_0621000",
            "PF3D7_0420300OR"),
  translate = "en",  ## to translate the non-english titles
  year_start = 2018, 
  year_end   = 2021, max_pages = 2)
#> # A tibble: 21 × 6
#>    Query                                   Title Url   Authors  Year translation
#>    <chr>                                   <chr> <chr> <chr>   <int> <chr>      
#>  1 PF3D7_0420300 OR MAL4P1.192 OR Q8I1N6 … The … http… EFG Cu…  2021 The Transc…
#>  2 PF3D7_0420300 OR MAL4P1.192 OR Q8I1N6 … A si… http… E Real…  2021 A single-c…
#>  3 PF3D7_0420300 OR MAL4P1.192 OR Q8I1N6 … The … http… DR Alv…  2021 The RNA st…
#>  4 PF3D7_0420300 OR MAL4P1.192 OR Q8I1N6 … Full… http… M Yang…  2021 Full-Lengt…
#>  5 PF3D7_0420300 OR MAL4P1.192 OR Q8I1N6 … Dete… http… A Bios…  2020 Detection …
#>  6 PF3D7_0420300 OR MAL4P1.192 OR Q8I1N6 … An A… http… E Cubi…  2020 An ApiAP2 …
#>  7 PF3D7_0420300 OR MAL4P1.192 OR Q8I1N6 … Refi… http… L Chap…  2020 Refining t…
#>  8 PF3D7_0420300 OR MAL4P1.192 OR Q8I1N6 … Tran… http… SE Lin…  2019 Transcript…
#>  9 PF3D7_0420300 OR MAL4P1.192 OR Q8I1N6 … Exte… http… SE Lin…  2019 Extensive …
#> 10 PF3D7_0420300 OR MAL4P1.192 OR Q8I1N6 … ApiA… http… MD Jen…  2019 ApiAP2 tra…
#> # ℹ 11 more rows

searchHP()

This function enables you to search HitPredict(López, Nakai, and Patil 2015) database and procure high-confidence Protein-Protein interactions(PPI) for your organism of interest. All it requires is a gene ID and taxon ID. HitPredict database provides PPI data in form of Uniprot IDs which are not always ideal for apicomplexan biologists. Therefore, we provide functionality to convert these Uniprot IDs back to gene IDs by setting uniprotToGID=TRUE . Since the only apicomplexan in HitPredict is Plasmodium falciparum this gene ID mapping conversion functionality is only limited for Plasmodium. It should be turned off, when using it for non-apicomplexan organism as shown below.

## Single gene query
searchHP("PF3D7_0418300") %>% head()
#>   Interactor Interaction       Name Experiments        Category Method.Score
#> 1 A0A5K1K7X4       47066 A0A5K1K7X4           1 High-throughput         0.35
#> 2     C0H4E0       86712     C0H4E0           1 High-throughput         0.39
#> 3     C0H4U4       86784     C0H4U4           1 High-throughput         0.39
#> 4     C0H586       86859     C0H586           1 High-throughput         0.39
#> 5     C0H5G3       86921     C0H5G3           1 High-throughput         0.39
#> 6     Q8I398     1301310     Q8I398           1 High-throughput         0.49
#>   Annotation.Score Interaction.Score Confidence       QueryID       Gene ID
#> 1             0.16             0.238        Low PF3D7_0418300 PF3D7_0532100
#> 2             0.16             0.251        Low PF3D7_0418300 PF3D7_0515400
#> 3             0.16             0.251        Low PF3D7_0418300 PF3D7_0813300
#> 4             0.16             0.251        Low PF3D7_0418300 PF3D7_0933200
#> 5             0.16             0.251        Low PF3D7_0418300 PF3D7_1341300
#> 6             0.16             0.282       High PF3D7_0418300 PF3D7_0905100

## To use it for other organism, turn off uniprotToGID and provide taxid of the organism
test <- searchHP("BRCA1",taxid = "3702" , uniprotToGID = FALSE)

## Multiple gene query
res <- lapply(c("PF3D7_0418300","PF3D7_1118500"), function(x){searchHP(x,uniprotToGID = FALSE)})%>% plyr::ldply()

res %>% tail()
#>    Interaction Interactor       Name Experiments        Category Method.Score
#> 15       86921     C0H5G3     C0H5G3           1 High-throughput         0.39
#> 16       47066 A0A5K1K7X4 A0A5K1K7X4           1 High-throughput         0.35
#> 17     1301321     Q9U0N1     Q9U0N1           1 High-throughput         0.35
#> 18     1303056     Q8IJG6     Q8IJG6           1 High-throughput         0.49
#> 19       91252     C6KTD2       SET1           1 High-throughput         0.39
#> 20     1301317     Q8I1Q4     Q8I1Q4           1 High-throughput         0.49
#>    Annotation.Score Interaction.Score Confidence       QueryID
#> 15             0.16             0.251        Low PF3D7_0418300
#> 16             0.16             0.238        Low PF3D7_0418300
#> 17             0.16             0.238        Low PF3D7_0418300
#> 18             0.50             0.494       High PF3D7_1118500
#> 19             0.50             0.439       High PF3D7_1118500
#> 20             0.16             0.282       High PF3D7_1118500

## You can now use toGeneid function which uses PlasmoDB release 68 annotation to
## map the uniprot IDs back to the gene IDs
toGeneid(res$Interactor,from = "uniprot","ensembl") %>% full_join(., res, by = c("UniProt ID(s)" = "Interactor"))
#> # A tibble: 20 × 11
#>    `Gene ID` `UniProt ID(s)` Interaction Name  Experiments Category Method.Score
#>    <chr>     <chr>                 <int> <chr>       <int> <chr>           <dbl>
#>  1 PF3D7_01… Q9U0N1              1301321 Q9U0…           1 High-th…         0.35
#>  2 PF3D7_04… Q8I1Q4              1301317 Q8I1…           1 High-th…         0.49
#>  3 PF3D7_05… C0H4E0                86712 C0H4…           1 High-th…         0.39
#>  4 PF3D7_05… Q8I3J7              1301311 Q8I3…           1 High-th…         0.49
#>  5 PF3D7_05… A0A5K1K7X4            47066 A0A5…           1 High-th…         0.35
#>  6 PF3D7_06… C6KTD2                91252 SET1            1 High-th…         0.39
#>  7 PF3D7_08… Q8IAM0              1301313 Q8IA…           1 High-th…         0.49
#>  8 PF3D7_08… C0H4U4                86784 C0H4…           1 High-th…         0.39
#>  9 PF3D7_08… Q8IB88              1301314 Q8IB…           1 High-th…         0.49
#> 10 PF3D7_09… Q8I398              1301310 Q8I3…           1 High-th…         0.49
#> 11 PF3D7_09… C0H586                86859 C0H5…           1 High-th…         0.39
#> 12 PF3D7_10… Q8IJG6              1301319 Q8IJ…           1 High-th…         0.49
#> 13 PF3D7_10… Q8IJG6              1303056 Q8IJ…           1 High-th…         0.49
#> 14 PF3D7_11… Q8IIP2              1301318 Q8II…           1 High-th…         0.49
#> 15 PF3D7_11… Q8III3              1301317 Q8II…           1 High-th…         0.49
#> 16 PF3D7_12… Q8I5D2              1301312 MSP9            1 High-th…         0.49
#> 17 PF3D7_13… Q8IET8              1301316 Q8IE…           1 High-th…         0.49
#> 18 PF3D7_13… Q8IEM0              1301315 Q8IE…           1 High-th…         0.49
#> 19 PF3D7_13… C0H5G3                86921 C0H5…           1 High-th…         0.39
#> 20 PF3D7_14… Q8IKF6              1301320 Q8IK…           1 High-th…         0.49
#> # ℹ 4 more variables: Annotation.Score <dbl>, Interaction.Score <dbl>,
#> #   Confidence <chr>, QueryID <chr>

Another scenario where users might be interested in setting uniportToGID=FALSE might be when they are querying thousands of IDs. Since ID conversion is carried out using biomaRt, it might be redundant to convert same Uniprot ID multiple times if it has multiple interacting partners.

For convenience, we therefore provide another function toGeneid() which will quickly converts the Uniprot IDs back to Ensembl IDs.

searchIpDb()

This function enables you to search InParanoiDB 9 (Persson and Sonnhammer 2023) database and procure high-confidence orthologs for your organism of interest. The input required is a character vector of gene IDs or Uniprot IDs. In case of gene IDs, the ids are converted to Uniprot IDs first to comply with InParanoiDB API query format. We only provide gene ID to Uniprot ID conversion for organisms that are covered by VEuPathDB as searchIpDb() function use our own toGeneid() function to fetch all Uniprot IDs. Users can separately convert their gene IDs to uniprot IDs as well using other R packages such as biomaRt R package.

## Using Gene IDs
searchIpDb( c("PF3D7_0807800", "PF3D7_1023900")) %>% head()
#> success Q8IAR6
#> success Q8IJG6
#> # A tibble: 6 × 10
#>   `#Unique_group_id` Species       TaxID Protein Gene_name Score Inparalog_score
#>                <dbl> <chr>         <dbl> <chr>   <chr>     <dbl>           <dbl>
#> 1           67442957 Perkinsus m… 423536 C5LD32  Pmar_PMA…   126           1    
#> 2           67442957 Perkinsus m… 423536 C5L7W9  Pmar_PMA…   126           0.397
#> 3           67442957 Plasmodium …  36329 Q8IAR6  PF3D7_08…   126           1    
#> 4           71821781 Plasmodium … 126793 A5KAC7  PVX_0881…   515           1    
#> 5           71821781 Plasmodium …  36329 Q8IAR6  PF3D7_08…   515           1    
#> 6          135850834 Plasmodium …  36329 Q8IAR6  PF3D7_08…   179           1    
#> # ℹ 3 more variables: Seed_score <dbl>, Description <chr>, queryid <chr>

## Using uniprot IDs
searchIpDb( c("C5LD32", "A5KAC7"),idtype = "uniprot") %>% head()
#> success C5LD32
#> success A5KAC7
#> # A tibble: 6 × 10
#>   `#Unique_group_id` Species       TaxID Protein Gene_name Score Inparalog_score
#>                <dbl> <chr>         <dbl> <chr>   <chr>     <dbl>           <dbl>
#> 1            1360218 Perkinsus m… 4.24e5 C5LD32  Pmar_PMA…   171           1    
#> 2            1360218 Perkinsus m… 4.24e5 C5L7W9  Pmar_PMA…   171           0.354
#> 3            1360218 Dentipellis… 1.88e6 A0A5B1… DENSPDRA…   171           1    
#> 4            2468578 Oryzias lat… 8.09e3 H2M0M9  LOC10117…   158           1    
#> 5            2468578 Perkinsus m… 4.24e5 C5LD32  Pmar_PMA…   158           1    
#> 6            2468578 Oryzias lat… 8.09e3 H2L3R1  LOC10115…   158           0.759
#> # ℹ 3 more variables: Seed_score <dbl>, Description <chr>, queryid <chr>

You might see some of the Uniprot ID failing such as Q2KNU4 and Q2KNU5 and their respective URLs. These Uniprot IDs are missing from the InParanoiDB 9 database either because either they are old and discontinued or missing from the database. When converting gene IDs to Uniprot IDs, this function try querying all the Uniprot IDs provided by VEuPathDB.

searchKipho()

This functions let you fetch the Malaria Parasite Kinome-Phosphatome Resource (KiPho) database (Pandey, Kumar, and Gupta 2017) without leaving R. The organism in KiPho includes (see below):

Abbreviation Species
pb Plasmodium berghii
pv Plasmodium vivax
pf Plasmodium falciparum
pc Plasmodium chabaudi

Beside the organism, user needs to specify type="kinase" to fetch the Kinome and "type=phosphatase" to fetch Phosphatome.

searchKipho(org="pf",type = "kinase")
#> # A tibble: 148 × 7
#>    `Gene ID`     `Previous ID(s)`   `Product Description`       `Protein Length`
#>    <chr>         <chr>              <chr>                                  <int>
#>  1 PF3D7_0102600 PFA0130c;MAL1P1.17 serine/threonine protein k…              630
#>  2 PF3D7_0103700 PFA0185w;MAL1P1.23 L-seryl-tRNA(Sec) kinase, …              535
#>  3 PF3D7_0107600 PFA0380w;MAL1P2.04 serine/threonine protein k…             1595
#>  4 PF3D7_0110600 PFA0515w;MAL1P2.32 phosphatidylinositol-4-pho…             1710
#>  5 PF3D7_0110900 PFA0530c;MAL1P2.35 adenylate kinase-like prot…              186
#>  6 PF3D7_0111500 PFA0555c;MAL1P2.40 UMP-CMP kinase, putative                 371
#>  7 PF3D7_0203100 PFB0150c;PF02_0030 protein kinase, putative                2485
#>  8 PF3D7_0211700 PFB0520w;PF02_0109 tyrosine kinase-like prote…             1233
#>  9 PF3D7_0213400 PFB0605w;PF02_0125 protein kinase 7 (PK7)                   343
#> 10 PF3D7_0214600 PFB0665w;PF02_0137 serine/threonine protein k…             1714
#> # ℹ 138 more rows
#> # ℹ 3 more variables: `Conserved Protein Domain Family(Accession No)` <chr>,
#> #   `Conserved Protein Domain Family(Name)` <chr>, `Ortholog Group` <chr>
searchKipho(org="pf",type = "phosphatase")
#> # A tibble: 70 × 7
#>    `Gene ID`     `Previous ID(s)`   `Product Description`       `Protein Length`
#>    <chr>         <chr>              <chr>                                  <int>
#>  1 PF3D7_0107200 PFA0350w;MAL1P1.64 carbon catabolite represso…              337
#>  2 PF3D7_0107800 PFA0390w           double-strand break repair…             1233
#>  3 PF3D7_0303200 PFC0150w           HAD superfamily protein pu…             1162
#>  4 PF3D7_0305600 PFC0250c           AP endonuclease (DNA-[apur…              617
#>  5 PF3D7_0309000 PFC0380w           dual specificity protein p…              575
#>  6 PF3D7_0310300 PFC0430w           phosphoglycerate mutase pu…             1165
#>  7 PF3D7_0314400 PFC0595c           serine/threonine protein p…              308
#>  8 PF3D7_0319200 PFC0850c           endonuclease/exonuclease/p…              906
#>  9 PF3D7_0322100 PFC0980c           RNA triphosphatase (Prt1)                591
#> 10 PF3D7_0410300 PFD0505c;PFD0510c  protein phosphatase PPM1 p…              906
#> # ℹ 60 more rows
#> # ℹ 3 more variables: `Conserved Protein Domain Family(Accession_No)` <chr>,
#> #   `Conserved Protein Domain Family(Name)` <chr>, `Ortholog Group` <chr>

searchMidb()

This function enables you to fetch minor-introns information from MiDB database in bulk. By default, all intron classes are fetched (major-like, major_hybrid, minor-like, minor_hybrid, non-canonical). For more information on minor introns visit MiDB database.

## Let's see what organisms are present in MiDB
data("midbSpecies")

df <- searchMidb("Toxoplasma gondii ME49")
df %>% head()

searchMiip()

This function enables you to fetch Protein-protein interaction pairs of Plasmodium falciparum and the respective stage (sexual and asexual) they interact from MIIP database.

searchMiip(c("PF3D7_0807800","PF3D7_1023900"))
#> # A tibble: 4 × 5
#>   interactorA   descriptionA                      interactorB descriptionB stage
#>   <chr>         <chr>                             <chr>       <chr>        <chr>
#> 1 PF3D7_0807800 26S proteasome regulatory subuni… PF3D7_0710… conserved P… game…
#> 2 PF3D7_1023900 chromodomain-helicase-DNA-bindin… PF3D7_1014… protein KIC8 game…
#> 3 PF3D7_1023900 chromodomain-helicase-DNA-bindin… PF3D7_1138… protein KIC5 ring 
#> 4 PF3D7_1335100 merozoite surface protein 7       PF3D7_1023… chromodomai… schi…

searchPM()

Aside from searchGSC you can also use searchPM() to fetch literature information where your gene IDs of interest have been mentioned. This will however limit the search to title abstract and keywords. In the background, it makes use of easyPubMed() functions such as get_pubmed_ids and articles_to_list and then transforms the output in form of a table that is easy explore


searchPM(geneID = c("PF3D7_0420300","PF3D7_0621000"))
#> PubMed Query used for PF3D7_0420300 was: 
#>  "Plasmodium falciparum"[All Fields] AND "PF3D7_0420300"[Title/Abstract:~0] AND 2010/01/01:2025/12/31[Date - Publication]
#>       pmid                       doi
#> 1 39412522       10.7554/eLife.92201
#> 2 30526479 10.1186/s12864-018-5257-x
#>                                                                                                                                      title
#> 1 A  Plasmodium falciparum  MORC protein complex modulates epigenetic control of gene expression through interaction with heterochromatin.
#> 2    Schizont transcriptome variation among clinical isolates and laboratory-adapted clones of the malaria parasite Plasmodium falciparum.
#>   year month day       jabbrv      journal        GeneID
#> 1 2024    10  16        Elife        eLife PF3D7_0420300
#> 2 2019    03  18 BMC Genomics BMC genomics PF3D7_0420300

Gene IDs for which no results are available will be shown on the screen. However, when a query is successful, the function also prints the exact query that can be used by you for reproducibility purposes. This behavior can be turned off if you have a lot of gene IDs using verbose=FALSE.

"Plasmodium falciparum"[All Fields] AND "PF3D7_0420300"[Title/Abstract:~0] AND 2010/01/01:2025/12/31[Date - Publication] 

searchPhPl()

This convenience function allow users to fetch Disruptability and Mutant Phenotypes tables for gene of interest from PhenoPlasm database. fetch=1 helps fetch the Disruptability and fetch=2 helps fetch the Mutant Phenotype table.

searchPhPl(geneID = c("PF3D7_0420300","PF3D7_0621000","PF3D7_0523800"), org="pf",fetch = 1) %>% head()
#>             Species Disruptability                          Reference
#> 1 P. falciparum 3D7     Refractory USF piggyBac screen (Insert. mut.)
#> 2 P. falciparum 3D7     Refractory USF piggyBac screen (Insert. mut.)
#> 3 P. falciparum 3D7     Refractory       354041168 ko attempts failed
#>                                 Submitter      QueryGID
#> 1                     USF PiggyBac Screen PF3D7_0621000
#> 2                     USF PiggyBac Screen PF3D7_0523800
#> 3 Theo Sanderson, Francis Crick Institute PF3D7_0523800
searchPhPl(geneID = c("PF3D7_0420300","PF3D7_0621000","PF3D7_0523800"), org="pf", fetch=2) %>% head()
#> # A tibble: 1 × 6
#>   Species           Stage   Phenotype               Reference Submitter QueryGID
#>   <chr>             <chr>   <chr>                   <chr>     <chr>     <chr>   
#> 1 P. falciparum 3D7 Asexual Difference from wild-t… "PMID 39… Paul Sig… PF3D7_0…

Oftentimes, you would like to get the summary table like the one plotted in PhenoPlasm that combines both Disruptability and Mutant Phenotype information. Rather than using screen grab to get the snapshot of the table, one can now download the table from Advanced Search button by submitting the geneIDs of interest and can feed that file to easyPhplplottbl() function of plasmoRUtils to render such table from the phenotype.txt files directly

# Read the file
df <- read.csv("phenotype.txt", skip = 2, sep = "\t") %>%
dplyr::select(-3, -4) %>% #remove the empty cols: GeneLocalisation and OrthologLocalisation
dplyr::rename_with(~ gsub("Sprozoite", "Sporozoite", .x)) #Correct the colnames

easyPhplplottbl(df)

## Or you can pass the file path directly
easyPhplplottbl("phenotype.txt")
#Load sample data (subset of genes from phenotype.txt file above)
data(pf3d7PhplTable)
easyPhplplottbl(pf3d7PhplTable)
Gene Asexual Gametocyte Liver Oocyte Ookinete Sporozoite Viability
PF3D7_0105200
PF3D7_0105300
PF3D7_0105400
PF3D7_0217500 🟥
PF3D7_1337800 🟥

Windows users might face issues saving these plots as pdf directly in which case, the tables can be saved as HTML files which can then be converted to SVG or PDF formats using various online converters to combine them with other plots.

Note: As per Phenotype taxonomy of Phenoplasm, the database uses “D” for both Difference from wild-type and Egress defect which is confusing and difficult to resolve programmatically. An example of this is PF3D7_1337800 that have “D S D” in the “Gene Asexual”. While we have requested the database maintainer to fix this, please watch out for borderline cases like these.

searchTedConsensus()

This function helps users fetch the domain information from The Encyclopedia of Domains database given set of uniprot IDs. Usually these table contains a numeric CATH labels which are difficult to comprehend and user has to click on them one by one to find the domain name. We enable conversion of these CATH labels to description using returnCATHdesc=TRUE. This will try to scrap the labels for given CATH label from CATH database wherever possible.

searchTedConsensus(c("Q7K6A1","Q8IAP8","C0H4D0","C6KT90","Q8IBJ7"), returnCATHdesc=FALSE)
#>                        ted_id uniprot_acc                       md5_domain
#> 1 AF-Q7K6A1-F1-model_v4_TED01      Q7K6A1 b99e920f0ded31aa96af0ef9be1338f4
#> 2 AF-C0H4D0-F1-model_v4_TED01      C0H4D0 cd912dcbbb5d070cbb254c0a88278fe4
#> 3 AF-C6KT90-F1-model_v4_TED02      C6KT90 70d20592d9f682bff23dc6188f318244
#> 4 AF-C6KT90-F1-model_v4_TED01      C6KT90 7cc174ebefe723733b6e63508fd23a9e
#> 5 AF-Q8IBJ7-F1-model_v4_TED01      Q8IBJ7 71697d50571d5fe2331a13ff16503478
#>   consensus_level chopping nres_domain num_segments   plddt
#> 1            high    6-376         371            1 97.1740
#> 2          medium   55-153          99            1 88.9028
#> 3          medium  322-382          61            1 45.3118
#> 4          medium  172-203          32            1 48.8553
#> 5          medium    54-88          35            1 87.3500
#>   num_helix_strand_turn num_helix num_strand num_helix_strand num_turn
#> 1                    60        16          8               24       35
#> 2                    15         5          4                9        6
#> 3                     3         3          0                3        0
#> 4                     2         1          0                1        1
#> 5                     5         0          3                3        2
#>   proteome_id   cath_label cath_assignment_level cath_assignment_method
#> 1       36329  3.40.800.20                     H               foldseek
#> 2       36329 3.30.70.2380                     H               foldseek
#> 3       36329     4.10.860                     T              foldclass
#> 4       36329       1.20.5                     T              foldclass
#> 5       36329            -                     -                      -
#>   packing_density norm_rg tax_common_name                 tax_scientific_name
#> 1          13.064   0.298                 Plasmodium falciparum (isolate 3D7)
#> 2          12.537   0.306                 Plasmodium falciparum (isolate 3D7)
#> 3           9.900   0.374                 Plasmodium falciparum (isolate 3D7)
#> 4           8.900   0.403                 Plasmodium falciparum (isolate 3D7)
#> 5           9.833   0.370                 Plasmodium falciparum (isolate 3D7)
#>                                                                                                                                                       tax_lineage
#> 1 cellular organisms, Eukaryota, Sar, Alveolata, Apicomplexa, Aconoidasida, Haemosporida, Plasmodiidae, Plasmodium, Plasmodium (Laverania), Plasmodium falciparum
#> 2 cellular organisms, Eukaryota, Sar, Alveolata, Apicomplexa, Aconoidasida, Haemosporida, Plasmodiidae, Plasmodium, Plasmodium (Laverania), Plasmodium falciparum
#> 3 cellular organisms, Eukaryota, Sar, Alveolata, Apicomplexa, Aconoidasida, Haemosporida, Plasmodiidae, Plasmodium, Plasmodium (Laverania), Plasmodium falciparum
#> 4 cellular organisms, Eukaryota, Sar, Alveolata, Apicomplexa, Aconoidasida, Haemosporida, Plasmodiidae, Plasmodium, Plasmodium (Laverania), Plasmodium falciparum
#> 5 cellular organisms, Eukaryota, Sar, Alveolata, Apicomplexa, Aconoidasida, Haemosporida, Plasmodiidae, Plasmodium, Plasmodium (Laverania), Plasmodium falciparum

searchTedConsensus(c("Q7K6A1","Q8IAP8","C0H4D0","C6KT90","Q8IBJ7"), returnCATHdesc=TRUE)
#>                        ted_id uniprot_acc                       md5_domain
#> 1 AF-Q7K6A1-F1-model_v4_TED01      Q7K6A1 b99e920f0ded31aa96af0ef9be1338f4
#> 2 AF-C0H4D0-F1-model_v4_TED01      C0H4D0 cd912dcbbb5d070cbb254c0a88278fe4
#> 3 AF-C6KT90-F1-model_v4_TED02      C6KT90 70d20592d9f682bff23dc6188f318244
#> 4 AF-C6KT90-F1-model_v4_TED01      C6KT90 7cc174ebefe723733b6e63508fd23a9e
#> 5 AF-Q8IBJ7-F1-model_v4_TED01      Q8IBJ7 71697d50571d5fe2331a13ff16503478
#>   consensus_level chopping nres_domain num_segments   plddt
#> 1            high    6-376         371            1 97.1740
#> 2          medium   55-153          99            1 88.9028
#> 3          medium  322-382          61            1 45.3118
#> 4          medium  172-203          32            1 48.8553
#> 5          medium    54-88          35            1 87.3500
#>   num_helix_strand_turn num_helix num_strand num_helix_strand num_turn
#> 1                    60        16          8               24       35
#> 2                    15         5          4                9        6
#> 3                     3         3          0                3        0
#> 4                     2         1          0                1        1
#> 5                     5         0          3                3        2
#>   proteome_id   cath_label cath_assignment_level cath_assignment_method
#> 1       36329  3.40.800.20                     H               foldseek
#> 2       36329 3.30.70.2380                     H               foldseek
#> 3       36329     4.10.860                     T              foldclass
#> 4       36329       1.20.5                     T              foldclass
#> 5       36329            -                     -                      -
#>   packing_density norm_rg tax_common_name                 tax_scientific_name
#> 1          13.064   0.298                 Plasmodium falciparum (isolate 3D7)
#> 2          12.537   0.306                 Plasmodium falciparum (isolate 3D7)
#> 3           9.900   0.374                 Plasmodium falciparum (isolate 3D7)
#> 4           8.900   0.403                 Plasmodium falciparum (isolate 3D7)
#> 5           9.833   0.370                 Plasmodium falciparum (isolate 3D7)
#>                                                                                                                                                       tax_lineage
#> 1 cellular organisms, Eukaryota, Sar, Alveolata, Apicomplexa, Aconoidasida, Haemosporida, Plasmodiidae, Plasmodium, Plasmodium (Laverania), Plasmodium falciparum
#> 2 cellular organisms, Eukaryota, Sar, Alveolata, Apicomplexa, Aconoidasida, Haemosporida, Plasmodiidae, Plasmodium, Plasmodium (Laverania), Plasmodium falciparum
#> 3 cellular organisms, Eukaryota, Sar, Alveolata, Apicomplexa, Aconoidasida, Haemosporida, Plasmodiidae, Plasmodium, Plasmodium (Laverania), Plasmodium falciparum
#> 4 cellular organisms, Eukaryota, Sar, Alveolata, Apicomplexa, Aconoidasida, Haemosporida, Plasmodiidae, Plasmodium, Plasmodium (Laverania), Plasmodium falciparum
#> 5 cellular organisms, Eukaryota, Sar, Alveolata, Apicomplexa, Aconoidasida, Haemosporida, Plasmodiidae, Plasmodium, Plasmodium (Laverania), Plasmodium falciparum
#>              cath_label_desc
#> 1 Histone deacetylase domain
#> 2                           
#> 3                           
#> 4                           
#> 5                       NULL

In the example above, C0H4D0 have CATH label 3.30.70.2380. But this superfamily doesn’t have a name. Besides, sometimes instead of Superfamily CATH labels, TED might use CATH-Gene3D Hierarchy. No description is returned in such cases.

Accessing malaria.tools database.

Some visualization functions have been developed to produce similar visualizations similar to what rendered by malaria.tools database but are publication ready. User can plot Condition Specific and Stage Specific expression of gene of interest in two organisms: Plasmodium falciparum and Plasmodium berghi.

  • plotAllCondition(): This function lets you create publication ready plots of TPM normalized expression values across multiple stages of parasite using bulk-RNAseq data from malaria.tools.
# TPM plot (non-interactive)
plotAllCondition(geneID = "PBANKA_0100600")

plotAllCondition(geneID = "PBANKA_0100600",plotify = TRUE) ## interactive

## To get the data used for making above plot use returnData argument
plotAllCondition(geneID = "PBANKA_0100600",returnData = TRUE) %>% head()
#>                                     condition     mean     min     max   group
#> 1                          Asexual: SRP099925 460.6603 396.042 551.808 Asexual
#> 2              Asexual, PbSR-MG KO: SRP109709 403.1830 355.452 442.661 Asexual
#> 3               10 hpi, ab libitum: SRP059210 224.5177 206.967 236.912      10
#> 4         10 hpi, diet restriction: SRP059210 228.0705 218.048 238.093      10
#> 5       10 hpi, ab libitum, kin KO: SRP059210 155.0550 155.055 155.055      10
#> 6 10 hpi, diet restriction, kin KO: SRP059210 130.9310 130.931 130.931      10

Users can also plot stage specific average TPMs as well similar to the plots rendered in malaria.tools using plotStageSpecific() function.

plotStageSpecific(geneID = "PBANKA_0100600",plotify = TRUE)

Note: searchMT() function available in previous version has been depreciated due to its repeated failure given the database latency. easyPie therefore has also been removed

Session Info

utils::sessionInfo()
#> R version 4.4.1 (2024-06-14 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26100)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=English_India.utf8  LC_CTYPE=English_India.utf8   
#> [3] LC_MONETARY=English_India.utf8 LC_NUMERIC=C                  
#> [5] LC_TIME=English_India.utf8    
#> 
#> time zone: Asia/Riyadh
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] polyglotr_1.7.0    rvest_1.0.4        plyr_1.8.9         dplyr_1.1.4       
#> [5] plasmoRUtils_1.1.0 rlang_1.1.6        readr_2.1.5        janitor_2.2.1     
#> [9] BiocStyle_2.32.1  
#> 
#> loaded via a namespace (and not attached):
#>   [1] IRanges_2.38.1              dichromat_2.0-0.1          
#>   [3] vroom_1.6.5                 progress_1.2.3             
#>   [5] vsn_3.72.0                  nnet_7.3-19                
#>   [7] Biostrings_2.72.1           vctrs_0.6.5                
#>   [9] digest_0.6.37               png_0.1-8                  
#>  [11] proxy_0.4-27                MSnbase_2.30.1             
#>  [13] parallelly_1.45.1           MASS_7.3-61                
#>  [15] pkgdown_2.1.3               reshape2_1.4.4             
#>  [17] foreach_1.5.2               BiocGenerics_0.50.0        
#>  [19] withr_3.0.2                 xfun_0.52                  
#>  [21] ggpubr_0.6.1                survival_3.8-3             
#>  [23] memoise_2.0.1               hexbin_1.28.5              
#>  [25] ggsci_3.2.0                 mixtools_2.0.0.1           
#>  [27] systemfonts_1.2.3           ragg_1.4.0                 
#>  [29] gtools_3.9.5                easyPubMed_2.13            
#>  [31] Formula_1.2-5               prettyunits_1.2.0          
#>  [33] KEGGREST_1.44.1             promises_1.3.3             
#>  [35] httr_1.4.7                  rstatix_0.7.2              
#>  [37] restfulr_0.0.16             globals_0.18.0             
#>  [39] ps_1.9.1                    rstudioapi_0.17.1          
#>  [41] UCSC.utils_1.0.0            generics_0.1.4             
#>  [43] processx_3.8.6              curl_6.4.0                 
#>  [45] ncdf4_1.24                  S4Vectors_0.42.1           
#>  [47] zlibbioc_1.50.0             ScaledMatrix_1.12.0        
#>  [49] randomForest_4.7-1.2        bio3d_2.4-5                
#>  [51] GenomeInfoDbData_1.2.12     SparseArray_1.4.8          
#>  [53] xtable_1.8-4                stringr_1.5.1              
#>  [55] desc_1.4.3                  doParallel_1.0.17          
#>  [57] evaluate_1.0.4              S4Arrays_1.4.1             
#>  [59] BiocFileCache_2.12.0        preprocessCore_1.66.0      
#>  [61] hms_1.1.3                   GenomicRanges_1.56.2       
#>  [63] bookdown_0.43               irlba_2.3.5.1              
#>  [65] colorspace_2.1-1            filelock_1.0.3             
#>  [67] magrittr_2.0.3              snakecase_0.11.1           
#>  [69] later_1.4.2                 viridis_0.6.5              
#>  [71] lattice_0.22-6              MsCoreUtils_1.16.1         
#>  [73] future.apply_1.20.0         SparseM_1.84-2             
#>  [75] XML_3.99-0.18               scuttle_1.14.0             
#>  [77] triebeard_0.4.1             matrixStats_1.5.0          
#>  [79] class_7.3-22                pillar_1.11.0              
#>  [81] nlme_3.1-166                iterators_1.0.14           
#>  [83] compiler_4.4.1              beachmat_2.20.0            
#>  [85] stringi_1.8.7               gower_1.0.2                
#>  [87] SummarizedExperiment_1.34.0 dendextend_1.19.1          
#>  [89] lubridate_1.9.4             GenomicAlignments_1.40.0   
#>  [91] drawProteins_1.24.0         crayon_1.5.3               
#>  [93] abind_1.4-8                 BiocIO_1.14.0              
#>  [95] bit_4.6.0                   chromote_0.5.1             
#>  [97] pcaMethods_1.96.0           codetools_0.2-20           
#>  [99] textshaping_1.0.1           recipes_1.3.1              
#> [101] BiocSingular_1.20.0         MLInterfaces_1.84.0        
#> [103] crosstalk_1.2.1             bslib_0.9.0                
#> [105] e1071_1.7-16                plotly_4.11.0              
#> [107] LaplacesDemon_16.1.6        MultiAssayExperiment_1.30.3
#> [109] splines_4.4.1               Rcpp_1.1.0                 
#> [111] dbplyr_2.5.0                sparseMatrixStats_1.16.0   
#> [113] knitr_1.50                  blob_1.2.4                 
#> [115] utf8_1.2.6                  clue_0.3-66                
#> [117] mzR_2.38.0                  AnnotationFilter_1.28.0    
#> [119] fs_1.6.6                    QFeatures_1.14.2           
#> [121] listenv_0.9.1               mzID_1.42.0                
#> [123] DelayedMatrixStats_1.26.0   ggsignif_0.6.4             
#> [125] tibble_3.3.0                Matrix_1.7-1               
#> [127] statmod_1.5.0               tzdb_0.5.0                 
#> [129] lpSolve_5.6.23              pkgconfig_2.0.3            
#> [131] tools_4.4.1                 cachem_1.1.0               
#> [133] RSQLite_2.4.1               viridisLite_0.4.2          
#> [135] DBI_1.2.3                   impute_1.78.0              
#> [137] fastmap_1.2.0               rmarkdown_2.29             
#> [139] scales_1.4.0                grid_4.4.1                 
#> [141] gt_1.0.0                    Rsamtools_2.20.0           
#> [143] broom_1.0.8                 sass_0.4.10                
#> [145] coda_0.19-4.1               FNN_1.1.4.1                
#> [147] BiocManager_1.30.26         graph_1.82.0               
#> [149] carData_3.0-5               selectr_0.4-2              
#> [151] SingleR_2.6.0               rpart_4.1.23               
#> [153] farver_2.1.2                yaml_2.3.10                
#> [155] AnnotationForge_1.46.0      MatrixGenerics_1.16.0      
#> [157] rtracklayer_1.64.0          cli_3.6.5                  
#> [159] purrr_1.1.0                 stats4_4.4.1               
#> [161] txdbmaker_1.0.1             lifecycle_1.0.4            
#> [163] caret_7.0-1                 Biobase_2.64.0             
#> [165] mvtnorm_1.3-3               lava_1.8.1                 
#> [167] kernlab_0.9-33              backports_1.5.0            
#> [169] BiocParallel_1.38.0         annotate_1.82.0            
#> [171] timechange_0.3.0            gtable_0.3.6               
#> [173] rjson_0.2.23                parallel_4.4.1             
#> [175] pROC_1.18.5                 limma_3.60.6               
#> [177] jsonlite_2.0.0              bitops_1.0-9               
#> [179] ggplot2_3.5.2               bit64_4.6.0-1              
#> [181] pRoloc_1.44.1               urltools_1.7.3.1           
#> [183] jquerylib_0.1.4             segmented_2.1-4            
#> [185] timeDate_4041.110           lazyeval_0.2.2             
#> [187] htmltools_0.5.8.1           affy_1.82.0                
#> [189] GO.db_3.19.1                rappdirs_0.3.3             
#> [191] glue_1.8.0                  httr2_1.2.0                
#> [193] XVector_0.44.0              RCurl_1.98-1.17            
#> [195] MALDIquant_1.22.3           mclust_6.1.1               
#> [197] gridExtra_2.3               igraph_2.1.4               
#> [199] R6_2.6.1                    tidyr_1.3.1                
#> [201] SingleCellExperiment_1.26.0 labeling_0.4.3             
#> [203] GenomicFeatures_1.56.0      cluster_2.1.8              
#> [205] GenomeInfoDb_1.40.1         ipred_0.9-15               
#> [207] DelayedArray_0.30.1         tidyselect_1.2.1           
#> [209] ProtGenerics_1.36.0         sampling_2.11              
#> [211] xml2_1.3.8                  car_3.1-3                  
#> [213] AnnotationDbi_1.66.0        future_1.67.0              
#> [215] ModelMetrics_1.2.2.2        rsvd_1.0.5                 
#> [217] affyio_1.74.0               topGO_2.56.0               
#> [219] data.table_1.17.8           websocket_1.4.4            
#> [221] mgsub_1.7.3                 htmlwidgets_1.6.4          
#> [223] RColorBrewer_1.1-3          biomaRt_2.60.1             
#> [225] hardhat_1.4.1               prodlim_2025.04.28         
#> [227] PSMatch_1.8.0

References

López, Yosvany, Kenta Nakai, and Ashwini Patil. 2015. “HitPredict Version 4: Comprehensive Reliability Scoring of Physical Proteinprotein Interactions from More Than 100 Species.” Database 2015: bav117. https://doi.org/10.1093/database/bav117.
Pandey, Rajan, Pawan Kumar, and Dinesh Gupta. 2017. “KiPho: Malaria Parasite Kinome and Phosphatome Portal.” Database 2017 (January). https://doi.org/10.1093/database/bax063.
Persson, Emma, and Erik L. L. Sonnhammer. 2023. “InParanoiDB 9: Ortholog Groups for Protein Domains and Full-Length Proteins.” Journal of Molecular Biology 435 (14): 168001. https://doi.org/10.1016/j.jmb.2023.168001.
Sardar, Rahila, Abhinav Kaushik, Rajan Pandey, Asif Mohmmed, Shakir Ali, and Dinesh Gupta. 2019. “ApicoTFdb: The Comprehensive Web Repository of Apicomplexan Transcription Factors and Transcription-Associated Co-Factors.” Database 2019 (January). https://doi.org/10.1093/database/baz094.