Different FastQC results after name-sorting BAM file, sequence duplication increases
1
0
Entering edit mode
3.0 years ago
tea.vuki ▴ 20

Okay, so what I did might was stupid, but I was determined to examine on my own a lot of things, and experiment a bit with tools. At one point I decided to do this: I had BAM file from RNA-seq, paired-end, and I decided to convert it to fastq file with samtools with and without sorting by read names. FastQC reports on both fastq files were pretty much the same, with one difference: on fastq files that came from name-sorted BAM file, I had huge amount of duplicated sequences, much higher than of those who came from unsorted BAM file. Now I am wondering, why did this happen? How is it possible that amount of duplicated sequences is 50% higher when I name sorted BAM file?

fastq bam fastqc • 1.2k views
ADD COMMENT
1
Entering edit mode
3.0 years ago
GenoMax 147k

When converting a BAM file to fastq you need to eliminate secondary alignments. These represent multi-mapping locations for reads. Not sure if you did that.

To cut down on the memory requirements for this module only sequences which first appear in the first 100,000 sequences in each file are analysed (LINK)

So if your reads are in different order then you may get a different duplication plot.

ADD COMMENT
0
Entering edit mode

I acted like crazy and did literally all variations: I had a BAM file that was sorted and secondary alignments were removed, and I also had file that wasn't sorted but I removed secondary alignments. File with only removed secondary alignments had exactly same duplication plot as one that wasn't filtered, and file that was both sorted and filtered had exactly same duplication plot as file that was only sorted. I know this is confusing but I did so many stupid comparisons that I find it hard to explain. But this explanation makes a lot of sense so thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1854 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6