Background

MPRAVarDB is an online database and web server for exploring regulatory effects of genetic variants. MPRAVarDB harbors 18 MPRA experiments designed to assess the regulatory effects of genetic variants in GWAS loci, eQTLs and various genomic features, resulting in a total of 242,818 variants tested across 32 cell lines and 37 disease. MPRAVarDB empowers the query of MPRA variants given genomic regions, disease and cell line or combination of query terms. Notably, MPRAVarDB offers a suite of pretrained machine learning models tailored to the specific diseases and cell lines, facilitating the genome-wide prediction of regulatory causal variants. MPRAVarDB is friendly to use, and users only need a few clicks to receive query and prediction results.

Contact

Authors: Weijia Jin (Maintainer), Javlon Nizomov

Email: w.jin@ufl.edu

Department of Biostatistics, College of Public Health & Health Professions

University of Florida, Gainesville, FL 32603

Summary of Studies


Submit

Last updated date: Aug 28, 2024

Welcome to submit new data to MPRAVarDB. Please send a message to w.jin@ufl.edu to submit new data. Thank you for your contribution!

Workflow for Database module

Data Preparation

For the Single Query option, MPRA study, disease and cell line can be selected from the dropdown lists. The query region can be entered into three text boxes named "Chromosome", "Start Position" and "End Position".

For the Batch Query option, a bed file, which contains multiple rows with each row corresponding to a genomic region. Each genomic region consists of three required fields "chromosome", "start" and "end" and two optional columns "disease" and "cell line".

Data Upload

For the "Batch Query" option, the bed file can be upload by the Browse and Load File two buttons. A preview option is provided to check the validity of the file format.

Results display and download

After the selections are made or the file is uploaded, clicking the Run Query button and the results will be displayed below in a table. A Download button will show up for allowing the users to download the query results.

Instruction for single query

Instruction for batch query


Workflow for Analysis module

Model preparation

Pretrained model is selected based on the choice of reference genome, MPRA study, disease, cell line and types of machine learning models, which can be selected from the dropdown lists.

Data Preparation

Two formats of query data are supported: (1) a VCF file which contains multiple variants. Each row corresponds to the genomic coordinate of one query variant, which consists of two required fields "chromosome" and "position"; (2) a fasta file, which includes multiple DNA sequences. Each DNA sequence can be selected surrounding one query variant with a length of 1000bp. Clicking the Browse button will upload the query data from the local computer to the online server. A preview option is provided by clicking the Load File button.

Prediction, results display and download:

Clicking "Generate Prediction" will call the selected pretrained model for predicting the regulatory effects of query variants, which will generate a table with prediction score. A Download button is also provided to download the tables.

Instruction for Analysis module



Download All MPRA Data

File Instructions
There are three required columns and three optional columns.

Required Optional
chr start end disease cellline
chr1 2440958 2640958 Disease1 Type1
chr2 55734467 58734467 Disease2 Type2
chr3 107880151 109880151 Disease3 Type3

Need to see an example file?
Download example .txt

Preview file


Choose a file type (VCF/FASTA):

Need to see an example file?

Preview file