Question

How to know which reads were not aligned to a reference genome?

0

Entering edit mode

7.9 years ago

Charles Yin ▴ 180

Read pair-end fastq files for a bacterial strain can be aligned to reference genomes by BWA. Using bedtools, the coverage of the reads on reference genome can be computed using the option 'coverageBed'. I think the coverage means how many reads were aligned to each position of the reference genome. I have new question about the coverage analysis, how do we know which reads were not aligned on the reference genome? Because the sequenced genome may be larger than reference genomes, or has rearrangements or duplicated regions, some reads may not find corresponding regions in reference genome. Is there any tools that can find unaligned reads? Thanks!

sequence SNP alignmet • 2.2k views

ADD COMMENT • link updated 7.9 years ago by bk11 ★ 3.1k • written 7.9 years ago by Charles Yin ▴ 180

score 1 · Answer 1 · 2017-08-28

1

Entering edit mode

7.9 years ago

bk11 ★ 3.1k

You can use samtools for find unaligned reads.

https://davetang.org/wiki/tiki-index.php?page=SAMTools

samtools view -b -f 4 file.bam > unmapped.bam

samtools view -b -F 4 file.bam > mapped.bam

ADD COMMENT • link 7.9 years ago by bk11 ★ 3.1k

0

Entering edit mode

Great, that is it! Thank you!

ADD REPLY • link 7.9 years ago by Charles Yin ▴ 180