Publication: Efficient online comparison and visualization of high throughput genomic variant lists
Loading...
Files
Date
Authors
Advisor
Journal Title
Journal ISSN
Volume Title
Publisher
ITU Graduate School
Type
Abstract
Over recent years, the proliferation of high-throughput sequencing has led to the generation of large amounts of genetic data. One of the most significant types of this data is variant data. The comparison and visualization of variant data are commonly performed operations, however, there are no tools addressing this need. At present, each operation must be performed via specialized scripts. Hence, a graphical interface facilitating these operations is highly valuable, especially for users not comfortable working with code and the command line. This thesis presents a user-friendly web application for comparing and visualizing genetic variants. This application provides functionality absent in literature and allows users to get insights into their data. Due to the complex nature of obtaining this data, it is valuable to compare results produced via differing methods of raw data generation and processing. The presented tool addresses capabilities for comparing numerous files individually to one another as well as comparing them collectively. Benchmarking capabilities are also provided based on user-provided ground truth files. Due to the potential benefits of merging files of differing origin, file grouping based on user-defined metadata is also provided. Commonly, there are regions of interest in a genome, to which analysis may be wished to be limited. As such, filtering functionality is provided based on genomic regions and chromosomes. An efficient genomic interval-based filtering algorithm is presented and described. This application was developed using Python 3 and utilizes the Plotly Dash library for web development which combines Flask and React to produce efficient data analysis web applications. It is deployed on a server provided by Istanbul Technical University and is accessible at https://bioinformatics.itu.edu.tr/vcf-observer freely. Case studies investigating results obtained from quality control and reproducibility studies are provided in detail along with relevant visualizations produced using the application. Various filtering and grouping parameters are investigated and results pertaining to the performances of different data production methodologies are described via results obtained from the application. Throughout the first 4 months of 2025, the application has received over 90 unique users uploading data from over 20 different countries. It provides novel functionality through a user-friendly interface, facilitating accessible variant data exploration to researchers and clinicians.
Description
Thesis (M.Sc.) -- Istanbul Technical University, Graduate School, 2025
Subject
biyobilişim, bioinformatics, genetik veri, genetic data, genetik mühendisliği, genetic engineering, genler, genes, veri görselleştirme, data visualization, genetik data, genetic data