Hey there!
Could anyone give some explainations for the htseq-count result here:
DATA
SAM file produced by mapping RNA-seq data (fr-secondstrand) to the genome.
- GTF filE
2 STEPS
use samtools to sort the BAM file by name, and converse it to SAM file
use htseq-count and sorted BAM file to count reads for each of the genes in GTF. Meanwhile, I want to see the three "-s" options results using my strand specific RNA-seq data.
Command like this: "htseq-count -i gene_id -s no accepted_hits_sorted_name.sam $gtf > both_strand.txt"
The other two change the "-s no" to "-s yes" and "-s reverse".
RESULTS (I merged the result as below:)
ID s=no s=yes s=reverse
EPlOSAG00000000069 4 0 4
EPlOSAG00000000035 10 10 0
EPlOSAG00000000022 0 18 0
EPlOSAG00000000053 58 69 1
EPlOSAG00000000065 239 2 243
EPlOSAG00000000072 39 80 0
Question:
The "-s no" means count the reads number regarless of the strand. Besides, "yes" and "reverse" means count the strand specific reads number according to the GTF annotation. Then the third row and fourth row should be added together equal to the second row (I can understand the number is not simply equal here because of the complexed conting mode. But it will not exceed the "no" options too much I think). However many of the results are out of this rules. For "EPlOSAG00000000072", the "yes" option counts much more reads comparing to "no" option. who can tell me the reasons?
Thank you very much!
OK! Thank you for your comments! It's helpful!