Establishing a fundamental understanding of the specific molecular biology of a species begins with reconstructing its genome (DNA) and transcriptome (RNA). Assembling the large volume of data generated by sequencing technologies into coherent information has been a primary focus of the bioinformatics field. However, linking the sequence information to biological phenomena by interpreting the assembled data remains a significant challenge.
Annotating the genes of a particular species plays a critical role in that interpretation. For instance, annotations may describe the processes involved in the environmental services of a species, indicate the wood quality of forestry products, and reveal infection patterns of pathogens in the food chain. Comprehensive and accurate annotations benefit from the completeness and correctness of the underlying information.
This project built 18 bioinformatics and computational biology tools and methodologies, which include nine methods to improve the quality of assembled genomes and transcriptomes, and six to annotate and estimate the functions of their genes. The team also developed three visualization tools to assess the quality of assemblies and their annotations.
The technologies developed through this research are being used by researchers worldwide, including Canadian researchers. Numerous research projects are already applying these tools towards analyzing cancer genomes and transcriptomes to identify disease markers that could be exploited for diagnostics or as actionable targets; gaining insights into tandem repeat expansion, a hallmark of over 40 neurological disorders; rare developmental diseases; and COVID-19. These advancements in bioinformatics will be critical in enabling researchers to better understand the world around us.