Hi,
I know this question has been posted before but I could not find answers to increase my counts for RNA-seq reads. I have done RNA-seq in Illumina Nextseq. I prepared libraries with NEBNext Ultra II Directional library kit. I used bowtie2 to map my reads to the reference genome. It is a bacterial genome. I have more than 95 % alignment rates in my samples. I converted sam file to bam file and name sorted. Then, I used ht-seq count to get the RNA-seq counts. here is my command line:
-o eo8_FP1_2_counts.txt htseq-count -f bam -t gene -i ID --stranded=reverse eo8_FP1_2_trimmed_sorted.bam ../Lplantarum.gff3
__no_feature 3587262
__ambiguous 39161
__too_low_aQual 15356
__not_aligned 35528
__alignment_not_unique 0
This is for a sample which has 5128080 (100.00%) were paired and 99.24% overall alignment rate. It seems like I am losing half of my reads.
What should I do to improve my ht-seq counts?
Thanks.
And the annotation file matches your fasta genome?
I dowloaded them both from ncbi from same genome.
A bacterial genome should be gene dense (and you would not need to worry about splicing). Have you reviewed the alignment in a genome browser?
I have not. Which genome browser would you recommend? and what should I check when I review it?
Use Integrated Genome Viewer (From Broad Institute). You will need to create a custom genome for your bacterium for which you will need a fasta format genome sequence file and GFF file to go with it. Once you open the alignment make sure reads are mapping under the genes.
You confirmed with the people who made the library that the prep was stranded?
Quote from original question:
I made the libraries and it is stranded.
Your commnad-line is broken (incomplete), but apparently you didn't specify the stranded option for htseq-count.
stranded option is reverse, I specified it. It is there. What else is incomplete I didnt understand?
[Update] my sorted bam files were name sorted and ht-seq count somehow do not like that kind of sort in my samples. So, I redo htseq count with coordinate sorted bam files, however, that did not improve the no-feature counts. I had some addition to my command line as below
Are there any comments on how to improve high no feature count?
Have you looked at the alignments in a genome browser as I had suggested above?
does this mean they are aligning to genome?
igvsnapshot
I am not sure if you can see the image. But it is aligning to the genes.
Please use to post the images properly : How to add images to a Biostars post
Looking at the images it does look like you have reads aligning to all regions so not sure why they are not being counted. Can you use "unstranded" counting option to see if you are able to get read counts to go up?
I tried that it does not change/improve my reads.
Only thing I can think of is the annotation file you downloaded is either not in the correct format or has some other errors in it.
I tried another annotation file and a fasta file from Ensemle, the previous one from ncbi. In order to see if there is a problem with the annotation file. I got lower counts and higher no feature.
I think this is a point where access to your data/analysis may be needed to figure out what is going on. A forum is not the best place to do that.
Perhaps someone else may be along with suggestions on other things to try.