MPRAVarDB is an online database and web server for exploring regulatory effects of genetic variants. MPRAVarDB harbors 18 MPRA experiments designed to assess the regulatory effects of genetic variants in GWAS loci, eQTLs and various genomic features, resulting in a total of 242,818 variants tested across 32 cell lines and 37 disease. MPRAVarDB empowers the query of MPRA variants given genomic regions, disease and cell line or combination of query terms. Notably, MPRAVarDB offers a suite of pretrained machine learning models tailored to the specific diseases and cell lines, facilitating the genome-wide prediction of regulatory causal variants. MPRAVarDB is friendly to use, and users only need a few clicks to receive query and prediction results.
Authors: Weijia Jin (Maintainer), Javlon Nizomov
Email: w.jin@ufl.edu
Department of Biostatistics, College of Public Health & Health Professions
University of Florida, Gainesville, FL 32603
Last updated date: Aug 28, 2024
Welcome to submit new data to MPRAVarDB. Please send a message to w.jin@ufl.edu to submit new data. Thank you for your contribution!
For the Single Query option, MPRA study, disease and cell line can be selected from the dropdown lists. The query region can be entered into three text boxes named "Chromosome", "Start Position" and "End Position".
For the Batch Query option, a bed file, which contains multiple rows with each row corresponding to a genomic region. Each genomic region consists of three required fields "chromosome", "start" and "end" and two optional columns "disease" and "cell line".
For the "Batch Query" option, the bed file can be upload by the Browse and Load File two buttons. A preview option is provided to check the validity of the file format.
After the selections are made or the file is uploaded, clicking the Run Query button and the results will be displayed below in a table. A Download button will show up for allowing the users to download the query results.
Instruction for single query
Instruction for batch query
Pretrained model is selected based on the choice of reference genome, MPRA study, disease, cell line and types of machine learning models, which can be selected from the dropdown lists.
Two formats of query data are supported: (1) a VCF file which contains multiple variants. Each row corresponds to the genomic coordinate of one query variant, which consists of two required fields "chromosome" and "position"; (2) a fasta file, which includes multiple DNA sequences. Each DNA sequence can be selected surrounding one query variant with a length of 1000bp. Clicking the Browse button will upload the query data from the local computer to the online server. A preview option is provided by clicking the Load File button.
Clicking "Generate Prediction" will call the selected pretrained model for predicting the regulatory effects of query variants, which will generate a table with prediction score. A Download button is also provided to download the tables.
Instruction for Analysis module
Required | Optional | ||||
---|---|---|---|---|---|
chr | start | end | disease | cellline | |
chr1 | 2440958 | 2640958 | Disease1 | Type1 | |
chr2 | 55734467 | 58734467 | Disease2 | Type2 | |
chr3 | 107880151 | 109880151 | Disease3 | Type3 |