Question

Different FastQC results after name-sorting BAM file, sequence duplication increases

0

Entering edit mode

3.0 years ago

tea.vuki ▴ 20

Okay, so what I did might was stupid, but I was determined to examine on my own a lot of things, and experiment a bit with tools. At one point I decided to do this: I had BAM file from RNA-seq, paired-end, and I decided to convert it to fastq file with samtools with and without sorting by read names. FastQC reports on both fastq files were pretty much the same, with one difference: on fastq files that came from name-sorted BAM file, I had huge amount of duplicated sequences, much higher than of those who came from unsorted BAM file. Now I am wondering, why did this happen? How is it possible that amount of duplicated sequences is 50% higher when I name sorted BAM file?

fastq bam fastqc • 1.2k views

ADD COMMENT • link updated 24 months ago by Ram 44k • written 3.0 years ago by tea.vuki ▴ 20

score 1 · Answer 1 · 2021-12-14

1

Entering edit mode

3.0 years ago

GenoMax 147k

When converting a BAM file to fastq you need to eliminate secondary alignments. These represent multi-mapping locations for reads. Not sure if you did that.

To cut down on the memory requirements for this module only sequences which first appear in the first 100,000 sequences in each file are analysed (LINK)

So if your reads are in different order then you may get a different duplication plot.

ADD COMMENT • link 3.0 years ago by GenoMax 147k

0

Entering edit mode

I acted like crazy and did literally all variations: I had a BAM file that was sorted and secondary alignments were removed, and I also had file that wasn't sorted but I removed secondary alignments. File with only removed secondary alignments had exactly same duplication plot as one that wasn't filtered, and file that was both sorted and filtered had exactly same duplication plot as file that was only sorted. I know this is confusing but I did so many stupid comparisons that I find it hard to explain. But this explanation makes a lot of sense so thank you!

ADD REPLY • link 3.0 years ago by tea.vuki ▴ 20