Question

Mouse genome featureCounts

0

Entering edit mode

4 months ago

rajdeepboral00 ▴ 60

This is currently the featureCounts command I am using to get the counts, but only 39.8% is getting successfully assigned. Can anyone please suggest that how can I improve my counts?

featureCounts -p -T 8 \
  --countReadPairs \
  -a /path/to/GRCm39/gencode.vM36.chr_patch_hapl_scaff.annotation.gtf \
  -s2 \
  -t exon \
  -o counts_CT1.txt sorted_CT1.bam

        ==========     _____ _    _ ____  _____  ______          _____
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
          v2.0.8

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                                                                            ||
||                           sorted_CT1.bam                                   ||
||                                                                            ||
||             Output file : counts_CT1.txt                                   ||
||                 Summary : counts_CT1.txt.summary                           ||
||              Paired-end : yes                                              ||
||        Count read pairs : yes                                              ||
||              Annotation : gencode.vM36.chr_patch_hapl_scaff.annotation ... ||
||      Dir for temp files : ./                                               ||
||                                                                            ||
||                 Threads : 8                                                ||
||                   Level : meta-feature level                               ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file gencode.vM36.chr_patch_hapl_scaff.annotation.gtf ...  ||
||    Features : 1821019                                                      ||
||    Meta-features : 78298                                                   ||
||    Chromosomes/contigs : 38                                                ||
||                                                                            ||
|| Process BAM file sorted_CT1.bam...                                         ||
||    Strand specific : reversely stranded                                    ||
||    Paired-end reads are included.                                          ||
||    Total alignments : 84936948                                             ||
||    Successfully assigned alignments : 33798384 (39.8%)                     ||
||    Running time : 0.72 minutes

GRCm39 RNA-seq featureCounts • 1.5k views

ADD COMMENT • link 4 months ago by rajdeepboral00 ▴ 60

0

Entering edit mode

There is no way to "improve" counts, if that is what you data has (and you have done the analysis properly). You seem to be counting at the exon level (and not summarizing at gene level), is that being done on purpose?

ADD REPLY • link 4 months ago by GenoMax 151k

0

Entering edit mode

No, i want to get the gene counts. I though this is equivalent to gene counts only. Can you please tell me how to get the gene count? what to include in the code? Actually i am very new to this data analysis and i dont know much.

ADD REPLY • link 4 months ago by rajdeepboral00 ▴ 60

0

Entering edit mode

If I remember correctly you will need to add the -g parameter (eg. -g gene_id) where the value of that parameter is what you want to summarize on.

In my example it will summarize all matches of reads to exons onto the level denoted by gene_id (might be something else in the gff you are using though, but gene_id is the most likely one) In essence you will need to use the tag that groups all exons to a unique feature (gene_id, transcript_id ... )

ADD REPLY • link 4 months ago by lieven.sterck 15k

1

Entering edit mode

The command looks fine to me. The -g parameter should be gene_id by default. The -t exon parameter is also default. These two options work together to specify, for each gene id, calculate the number of reads over their exons.

Adding the -f flag would make it count each exon seperately. As is (without -f), it counts reads over each exon, then adds the counts from exons with the same gene_id.

ADD REPLY • link 4 months ago by rfran010 ★ 1.5k

0

Entering edit mode

The percentage of mapped reads assigned to features in your annotation file will depend on what annotations you are mapping them to and the nature of your data.

For example, for ribo-depleted RNA-seq this is roughly the rate I see when mapping only to gene exons, especially when excluding multi-mappers and overlapping reads. This is because, for my data, many RNA molecules come from repetitive (many copies in the genome) regions, and I also detect a significant amount of intronic signal, which can add up considering how large introns are.

ADD REPLY • link 4 months ago by rfran010 ★ 1.5k

0

Entering edit mode

I have added the -g gene_id and -t exon options too but still the %assigned in featureCounts is same only...so is that normal to have such low percentage? or there maybe chance that i am doing some kind of error? The protocol used ribo depletion.Kindly answer.

ADD REPLY • link 4 months ago by rajdeepboral00 ▴ 60

1

Entering edit mode

Also inspect your alignments visually in a browser like IGV to see where the reads are piling up. If they are aligning outside gene models then that is a characteristic of your data.

ADD REPLY • link 4 months ago by GenoMax 151k

0

Entering edit mode

Can you confirm that you are using the correct GTF file linked to the genome you used for doing the mapping (species, version, ...) ?

One way to check why there is a low read count is to also count the multimappings. (you might get some pointers from this post for inspiration: featureCounts --primary tag counting uniquely mapped reads and excludes multi-mapped reads? ). If the "missing percentages" are in that part you at least know what is going on.

ADD REPLY • link 4 months ago by lieven.sterck 15k

0

Entering edit mode

N_unmapped  5008777
N_multimapping 3988726
N_noFeature 49636676
N_ambiguous 630120

This is the summary of my alignment for one of my samples and i had used the fasta and gtf files from https://www.gencodegenes.org/mouse/ ...For fasta i have taken the "whole genome" and for the gtf i have used the "Comprehensive gene annotation ALL It contains the comprehensive gene annotation on the reference chromosomes, scaffolds, assembly patches and alternate loci (haplotypes)" of GRCm39 Should i take into account the ambiguous and multimapping reads?

ADD REPLY • link updated 4 months ago by GenoMax 151k • written 4 months ago by rajdeepboral00 ▴ 60

1

Entering edit mode

N_noFeature 49636676

Looks like a large fraction of your reads do not map to known features (or could be mapping to rRNA/introns, which you are likely not counting).

ADD REPLY • link 4 months ago by GenoMax 151k

0

Entering edit mode

Can there be a chance that i am making any mistakes? or this is what the data is.

ADD REPLY • link 4 months ago by rajdeepboral00 ▴ 60

0

Entering edit mode

There's no obvious mistake I see, but I agree with GenoMax that you should check your RNA signal on IGV to see how it aligns to genes.

I tend to use the "primary" GTFs instead of the "All" gtfs, but it seems that it shouldn't matter that much.

ADD REPLY • link 4 months ago by rfran010 ★ 1.5k

1

Entering edit mode

i have tried using the primary one too but the results were kind of similar...yes i will take a look of my data with IGV, thanks alot.

ADD REPLY • link 4 months ago by rajdeepboral00 ▴ 60