科学研究
报告题目:

Theoretical analysis and benchmark of graph-based and alignment-based hybrid error correction methods for error-prone long reads

报告人:

Dr. Kin Fai Au (The Ohio State University)

报告时间:

报告地点:

数学院二楼报告厅

报告摘要:

Third-generation sequencing (TGS) technologies have advanced the progress of the biological research by generating reads that are substantially longer than second-generation sequencing (SGS) technologies. However, their notorious high error rate impedes straightforward data analysis and limits their application. The error-prone TGS long reads can be corrected by the high-quality SGS short reads, which is referred as hybrid error correction. A handful of hybrid error correction methods for these error-prone long reads have been developed to date. We here investigate the influences of the principal algorithmic factors of two major types of hybrid error correction methods by mathematical modelling and analyses on both simulated and real data. Our study reveals the distributions of accuracy gain with respect to algorithmic factors as well as different data scenarios, such as original long read error rates. Also, we present a comparative performance assessment of ten state-of-the-art error-correction methods. We established a common set of benchmarks for performance assessment, including sensitivity, accuracy, output rate, alignment rate, output read length, run time, and memory usage, as well as the effects of error correction on two downstream applications of long reads: de novo assembly and resolving haplotype sequences. Taking into account all of these metrics, we provide a suggestive guideline for method choice based on available data size, computing resources, and individual research goals.