Counts after using feature counts is low
2
0
Entering edit mode
18 days ago

I have used hisat2 for my alignment for mouse genome and most of them were above 90% , them i converted the same file to bam and then sorted and indexed it. but when I used featureCounts to generate counts only around 35-65% of the sequences are getting annotated.Kindly help me through this if possible. The counts are

Status sorted_CC1.bam

Assigned        81909834
Unassigned_Unmapped     7465379
Unassigned_Read_Type    0
Unassigned_Singleton    0
Unassigned_MappingQuality       0
Unassigned_Chimera      0
Unassigned_FragmentLength       0
Unassigned_Duplicate    0
Unassigned_MultiMapping 32127770
Unassigned_Secondary    0
Unassigned_NonSplit     0
Unassigned_NoFeatures   101236787
Unassigned_Overlapping_Length   0
Unassigned_Ambiguity    0

How will i improve it?

hisat2 featureCounts lowcounts • 723 views
ADD COMMENT
1
Entering edit mode

Seems like most of your reads are in genomic regions that have no annotation. Have you tried exploring through a genome browser (i.e. IGV) or by extracting the reads that are not overlap with exons?

ADD REPLY
0
Entering edit mode

i am very new to this data handling, can you please guide mw how should i do that and how will that help me.

ADD REPLY
1
Entering edit mode

I haven't done it myself but I can point you to the tools but you will have to do the heavy lifting yourself. I recommend checking out bedops to convert your annotation file to bed and to get a complement annotation (everything that is not exon). Then you want to start looking at samtools view to extract all the alignments within those regions before starting to make any assumptions. There are plenty of answers that already cover this so just search the forum and you will for sure find the answers.

ADD REPLY
1
Entering edit mode

As noted below use Integrated Genome Viewer (IGV). Quick start guide: https://igv.org/doc/desktop/#QuickStart/

There is a more detailed user guide linked in left pane.

ADD REPLY
0
Entering edit mode
featureCounts -p -T 8 -a /data/sata_data/home/rajdeep/GRCm39/gencode.vM36.chr_patch_hapl_scaff.annotation.gtf -s 2  -O  -o counts_CC2.txt sorted_39aligned_CC1.bam

This is the command i had used, any modifications that i can do to get a better count?

ADD REPLY
0
Entering edit mode

With new versions of featureCounts you also need to add the following option when you have paired end reads and use -p

--countReadPairs    If specified, fragments (or templates) will be counted
                      instead of reads.
ADD REPLY
0
Entering edit mode

after adding that also, no such improvement in the %assigned

ADD REPLY
0
Entering edit mode

Since we can't access/see your data you are going to need to diagnose the issue yourself or ask for local help. You could also use salmon ( https://salmon.readthedocs.io/en/latest/ ) with a set of latest mouse transcriptome to see if you get better results. Assuming all of your samples have similar assignments you could move forward with the analysis and see what you get. If there is a real problem with the data (e.g. bad libraries, DNA contamination etc) no bioinformatics magic will fix that.

ADD REPLY
1
Entering edit mode
18 days ago

Visualize your data and your GTF/ and GFF file in IGV and look at the overlaps and counts.

The featureCounts program will count the number of reads over exons. Visually evaluate your data.

That will tell you much about whether the genome was correct, whether the annotation matched, etc.

ADD COMMENT
0
Entering edit mode
17 days ago

No one can troubleshoot from just this. Have you looked at the obvious source of error; did you get the gtf and the genome from the same site? Are the chromosome names the same between your genome and the gtf?

ADD COMMENT
0
Entering edit mode

Yes, i have downloaded them from GENCODE and the chromosome names are also same

ADD REPLY

Login before adding your answer.

Traffic: 1698 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6