Question

Counts after using feature counts is low

0

Entering edit mode

18 days ago

rajdeepboral00 • 0

I have used hisat2 for my alignment for mouse genome and most of them were above 90% , them i converted the same file to bam and then sorted and indexed it. but when I used featureCounts to generate counts only around 35-65% of the sequences are getting annotated.Kindly help me through this if possible. The counts are

Status sorted_CC1.bam

Assigned        81909834
Unassigned_Unmapped     7465379
Unassigned_Read_Type    0
Unassigned_Singleton    0
Unassigned_MappingQuality       0
Unassigned_Chimera      0
Unassigned_FragmentLength       0
Unassigned_Duplicate    0
Unassigned_MultiMapping 32127770
Unassigned_Secondary    0
Unassigned_NonSplit     0
Unassigned_NoFeatures   101236787
Unassigned_Overlapping_Length   0
Unassigned_Ambiguity    0

How will i improve it?

hisat2 featureCounts lowcounts • 714 views

ADD COMMENT • link updated 16 days ago by GenoMax 148k • written 18 days ago by rajdeepboral00 • 0

1

Entering edit mode

Seems like most of your reads are in genomic regions that have no annotation. Have you tried exploring through a genome browser (i.e. IGV) or by extracting the reads that are not overlap with exons?

ADD REPLY • link 18 days ago by biofalconch ★ 1.3k

0

Entering edit mode

i am very new to this data handling, can you please guide mw how should i do that and how will that help me.

ADD REPLY • link 17 days ago by rajdeepboral00 • 0

1

Entering edit mode

I haven't done it myself but I can point you to the tools but you will have to do the heavy lifting yourself. I recommend checking out bedops to convert your annotation file to bed and to get a complement annotation (everything that is not exon). Then you want to start looking at samtools view to extract all the alignments within those regions before starting to make any assumptions. There are plenty of answers that already cover this so just search the forum and you will for sure find the answers.

ADD REPLY • link 17 days ago by biofalconch ★ 1.3k

1

Entering edit mode

As noted below use Integrated Genome Viewer (IGV). Quick start guide: https://igv.org/doc/desktop/#QuickStart/

There is a more detailed user guide linked in left pane.

ADD REPLY • link 17 days ago by GenoMax 148k

0

Entering edit mode

featureCounts -p -T 8 -a /data/sata_data/home/rajdeep/GRCm39/gencode.vM36.chr_patch_hapl_scaff.annotation.gtf -s 2  -O  -o counts_CC2.txt sorted_39aligned_CC1.bam

This is the command i had used, any modifications that i can do to get a better count?

ADD REPLY • link updated 16 days ago by GenoMax 148k • written 17 days ago by rajdeepboral00 • 0

0

Entering edit mode

With new versions of featureCounts you also need to add the following option when you have paired end reads and use -p

--countReadPairs    If specified, fragments (or templates) will be counted
                      instead of reads.

ADD REPLY • link 16 days ago by GenoMax 148k

0

Entering edit mode

after adding that also, no such improvement in the %assigned

ADD REPLY • link 16 days ago by rajdeepboral00 • 0

0

Entering edit mode

Since we can't access/see your data you are going to need to diagnose the issue yourself or ask for local help. You could also use salmon ( https://salmon.readthedocs.io/en/latest/ ) with a set of latest mouse transcriptome to see if you get better results. Assuming all of your samples have similar assignments you could move forward with the analysis and see what you get. If there is a real problem with the data (e.g. bad libraries, DNA contamination etc) no bioinformatics magic will fix that.

ADD REPLY • link 16 days ago by GenoMax 148k

score 1 · Answer 1 · 2024-12-04

1

Entering edit mode

17 days ago

Istvan Albert 102k

Visualize your data and your GTF/ and GFF file in IGV and look at the overlaps and counts.

The featureCounts program will count the number of reads over exons. Visually evaluate your data.

That will tell you much about whether the genome was correct, whether the annotation matched, etc.

ADD COMMENT • link 17 days ago by Istvan Albert 102k

score 0 · Answer 2 · 2024-12-04

0

Entering edit mode

17 days ago

swbarnes2 14k

No one can troubleshoot from just this. Have you looked at the obvious source of error; did you get the gtf and the genome from the same site? Are the chromosome names the same between your genome and the gtf?