Dear all,
I have paired-end (150bp) data for the chicken Ileum from Illumina TruSeq. I was told that it is first stranded. I removed the adapters through Trimmomatic tool. I then aligned it to the Chicken Genome (Ensembl) galGal4 using STAR aligner as well as Tophat2.
I wanted to quantify the reads using htseq-count, so I sorted the bam file by name.
I used the following parameters for htseq-count :
htseq-count --format='bam' --order='name' --stranded='yes' --minaqual='10' --type='exon' --idattr='gene_id' --samout='S1-RCI001-top-out.sam' S1-RCI001-tophat.sorted.bam genes.gtf > S1-RCI001-tophat-htseq-res.txt
Total number of SAM alignments : 30059627 (when htseq finishes running) On using grep "XF:" on S1-RCI001-top-out.sam I get 56541081 (Not sure which one to use for comparison) Now, the problem is, that I am getting an exceeding amount of "no_features" and maximum of the gene counts are zero
__no_feature 26618544
__ambiguous 1751
__too_low_aQual 0
__not_aligned 0
__alignment_not_unique 3199738
Please help me identify the problem. As far as I know, I used the correct GTF file, sorted the bam file by name, and I am sure that its first stranded so used stranded="yes".
Thanks for your help!
Candida Vaz
Can you post the output of the following commands?
samtools view -H S1-RCI001-tophat.sorted.bam
cut -f 1 genes.gtf | sort | uniq
My guess is that you have a mismatch in chromosome names. If the outputs are long, then just post 10 or so lines of each.
Dear Devon,
Here is the output from samtools view:
And here is the output from the genes.gtf file 1st column (just top ten lines)
OK, then michael.ante's suggestion is likely the correct one. Have a look at the sorted BAM file in IGV. That should give you a better feel for the sort of numbers that htseq-count should be producing.