Hi all,
I seem to be have problems with Tophat2 or HT-seq counts. I am visualizing the tophat BAM file using Tablet. The trouble is, there seems to be reads aligning to this specific gene (spi-c, zebrafish, reverse strand) however HT-seq counts (using stranded = reverse) is counting 0 reads for this gene. If i use stranded = yes, HT-seq counts is giving me a count of 35. I am confused because this is a dUTP library so stranded=reverse should be used. Does this mean that HT-seq counts cannot count genes found on the compliment strand? Here is a picture of the accepted hits for this gene http://imgur.com/a/i6jIe
Thanks in advance,
Could you add the full command you used for htseq count?
I am using the galaxy version of HT-seq Count. Here is the Job Command Line
samtools sort -n "/galaxylab/production/new/galaxy/database/files/000/dataset_392.dat" "name_sorted_alignment" && htseq-count --mode=union --stranded=reverse --minaqual=10 --type="exon" --idattr="gene_id" --order=name --format=bam name_sorted_alignment.bam "/galaxylab/production/new/galaxy/database/files/000/dataset_110.dat" | awk '{if ($1 ~ "no_feature|ambiguous|too_low_aQual|not_aligned|alignment_not_unique") print $0 | "cat 1>&2"; else print $0}' > /galaxylab/production/new/galaxy/database/files/008/dataset_8356.dat 2>/galaxylab/production/new/galaxy/database/files/008/dataset_8357.dat
Aren't there also many intronic reads in your screenshot? Since you are counting reads overlapping with exons....
Yeah, I notice that also. But I also see reads aligning to exons, therefore stranded = reverse should be giving some counts correct? Unless all these reads are being labelled as ambiguous due to overlapping another gene on the other strand?
Looks like you are using
mode=union
for the HTSeq-count. Different modes are described on the htseq-count help page here. As you suspect when a read overlaps two genes it is considered ambiguous inunion
mode.Looking through, I know that Union is the mode I need. However how can I visualize if these reads are are overlapping 2 genes?
Did you run tophat with
--fr-firststrand
option since you have a dUTP library?I already tried that, but its doesn't make a difference. I think I read somewhere that the --fr firststrand in tophat only edits XS flags?
If you have a stranded library then reads that are mapping to the correct strand should be ones that are counted. If you want to count every read that is mapped then you could say your library is non-stranded.
In tablet, how can I tell which strand a read pair is mapping to? The odd thing is that for other genes on the reverse strand, HT-seq count seems to be counting them.