convert bam file to raw read count file for DESeq or EdgeR
1
5
Entering edit mode
10.4 years ago
illinois.ks ▴ 210

I was working on bam files generated from Tophat in order to convert into the raw count file.

Firstly, I just converted directly bam file into the raw count file using the following command.

htseq-count -f bam KDR_pre_thout/accepted_hits.bam genes.gtf > KDR_pre_raw_count.txt

which directly converts the bam file into the raw count file.

But I found the following paper, which suggest that converting the bam file into the sam file and then, convert sam file into the raw count file.

I thought it would be same.. but NOT.

samtools sort -n KDR_pre_thout/accepted_hits.bam KDR_pre_sn
samtools view -o KDR_pre_sn.sam KDR_pre_sn.bam
htseq-count -s no -a 10 KDR_pre_sn.sam gene.gtf > KDR_pre_sn.count

I have used these two ways in order to convert the bam file into the count file.

And I found that these results are different. Which I have to follow?? Second as paper mentioned(maybe sort is the important difference) Then, why??? Anybody knows why sort is necessary??

Please help me with this.!

Thanks.

RNA-Seq • 17k views
ADD COMMENT
3
Entering edit mode
10.4 years ago

Until recently, htseq-count didn't support coordinate-sorted files, so if you used an older version then that'd be the cause of the difference. Otherwise, the results should be identical and any difference would be due to a bug. Presuming this is reproducible, you should notify Simon Anders so he can fix the bug.

BTW, the results from the name-sorted file will be more reliable, solely because that's been debugged more thoroughly (there's no a priori reason for there to be a difference).

ADD COMMENT
0
Entering edit mode

Hello Devon

Thank you for your comments!. so you are mentioning that it should be same whether I have followed the first trial (directly convert from bam to count file without sort) or the second trail ( convert bam to sam and then sam to count file with sorting option).

Since result difference is relatively big ( as you mentioned if it is a bug) , htseq-count should be fixed. ( I hope Simon Anders could see this posting!! )

BTW, I just realized that my one of option for htseq-count was different, which was -s no option...

THe default was "yes". I didn't set up this option for the first trial, which implicitly means that "yes" but my second trial, it was set as "no".

I suspect that it would also cause the difference. so changed it.. as third trial, but all my three trial return the different count file, which is weired.!! hm...

option 1)

htseq-count -f bam KDR_pre_thout/accepted_hits.bam genes.gtf > KDR_pre_raw_count.txt

option 2)

samtools sort -n KDR_pre_thout/accepted_hits.bam KDR_pre_sn
samtools view -o KDR_pre_sn.sam KDR_pre_sn.bam
htseq-count -s no -a 10 KDR_pre_sn.sam gene.gtf > KDR_pre_sn.count

option 3)

samtools sort -n KDR_pre_thout/accepted_hits.bam KDR_pre_sn
samtools view -o KDR_pre_sn.sam KDR_pre_sn.bam
htseq-count -a 10 KDR_pre_sn.sam gene.gtf > KDR_pre_sn.count

All the results from three trial looks different.

Could you please let me know what is "strand-specific assay" means???

ADD REPLY
1
Entering edit mode

Given that you switched between the strand-specific and unstranded counting as well as changing the minimum quality score for option 1, it's completely expected that you'd get different results. You need to do an apples-to-apples comparison for the results to be meaningful.

Regarding strand-specific libraries, I and others have explained this many many times. Here is one example. Just google around for more explanations.

ADD REPLY
0
Entering edit mode

Hello Devon

Thank you so much for your explanation!! It really helped!!! I will try to search and google more carefully!!! :)

BTW, as I reported before, given the exact same option (strand specific option, quality option), but still it returns the large difference in result when I try to do the sort the bam file or not.. I hope it can be fixed soon!! :)

ADD REPLY
0
Entering edit mode

did this issue fixed? or that you finally figured out why it is like that....

ADD REPLY

Login before adding your answer.

Traffic: 2742 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6