Question

Bacterial genome analysis - how the check if newly sequenced genome has errors

0

Entering edit mode

10 months ago

Veselina • 0

Hello! I am quite new to the field of genomics, so I apologize if my question is inadequate. I am a molecular biologist and have recently started as a research student, analyzing bacterial genomes. The genome of the bacteria I work with has been sequenced and is available in GenBank as the reference genome. However, this sequencing was done more than 10 years ago using Sanger, and there are numerous gaps in the data.

Before I joined, my lab's team re-sequenced the genome using NGS long reads (PacBio) and short reads (Illumina). We have the newly assembled genome based on the long reads, while the Illumina data is available only in raw format. Additionally, we sequenced the genome of a mutant of the same bacteria. Upon comparing the mutant's genome with the newly assembled sequenced genome, we found a few mismatches (suggested mutations). Interestingly, when comparing these mismatches to the reference genome, they appear to be matches. Consequently, we are unable to determine if the mismatches in the mutant arise from mutations in the genome or errors in the newly sequenced genome.

My question is: how can I verify if the mismatches are mutations or just errors in the newly assembled genome? Can I use the raw short reads data to check, and if so, how?

Thank you so much!

genome next-gen alignment • 683 views

ADD COMMENT • link 10 months ago by Veselina • 0

score 1 · Answer 1 · 2024-01-17

1

Entering edit mode

10 months ago

GenoMax 147k

how can I verify if the mismatches are mutations or just errors in the newly assembled genome?

You may want/need to use an independent method for absolute verification (that is what is currently done with patient samples). You can use sanger sequencing of the amplified regions where the mutation is/are.

You could also align the Illumina reads to the assembled genome and check for potential errors. Any errors should be easily apparent since error rate of Illumina reads is going to be much lower and you should see reads with similar sequence piling up nicely providing depth.

ADD COMMENT • link 10 months ago by GenoMax 147k

0

Entering edit mode

Thank you, GenoMax! Could you please suggest best tools to use to align the Illumina reads to the newly assembled genome? Thanks again!

ADD REPLY • link 10 months ago by Veselina • 0

1

Entering edit mode

I think bwa-mem2 or bowtie2 should do the trick :)

ADD REPLY • link 10 months ago by biofalconch ★ 1.3k

0

Entering edit mode

I will do that! Thank you a lot!

ADD REPLY • link 10 months ago by Veselina • 0