Hello! I am quite new to the field of genomics, so I apologize if my question is inadequate. I am a molecular biologist and have recently started as a research student, analyzing bacterial genomes. The genome of the bacteria I work with has been sequenced and is available in GenBank as the reference genome. However, this sequencing was done more than 10 years ago using Sanger, and there are numerous gaps in the data.
Before I joined, my lab's team re-sequenced the genome using NGS long reads (PacBio) and short reads (Illumina). We have the newly assembled genome based on the long reads, while the Illumina data is available only in raw format. Additionally, we sequenced the genome of a mutant of the same bacteria. Upon comparing the mutant's genome with the newly assembled sequenced genome, we found a few mismatches (suggested mutations). Interestingly, when comparing these mismatches to the reference genome, they appear to be matches. Consequently, we are unable to determine if the mismatches in the mutant arise from mutations in the genome or errors in the newly sequenced genome.
My question is: how can I verify if the mismatches are mutations or just errors in the newly assembled genome? Can I use the raw short reads data to check, and if so, how?
Thank you so much!
Thank you, GenoMax! Could you please suggest best tools to use to align the Illumina reads to the newly assembled genome? Thanks again!
I think bwa-mem2 or bowtie2 should do the trick :)
I will do that! Thank you a lot!