Question

R featureCounts very low assigned read rates

0

Entering edit mode

4.2 years ago

1111111 • 0

Hello,

I am trying to run R's featureCounts on sam files generated from alignment to mouse genome, and receive the following output:

My command:

> fc <- featureCounts(filenames, annot.ext ="./GCF_000001635.26_GRCm38.p6_genomic.gtf", isGTFAnnotationFile = TRUE)

Output: 

        ==========     _____ _    _ ____  _____  ______          _____  
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
       Rsubread 1.32.4

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 SAM file                                       ||
||                           S aged12h2002.fastq_1_40.trimmed_genome_-l_4 ... ||
||                                                                            ||
||              Annotation : GCF_000001635.26_GRCm38.p6_genomic.gtf (GTF)     ||
||      Dir for temp files : .                                                ||
||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||              Paired-end : no                                               ||
||      Multimapping reads : counted                                          ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file GCF_000001635.26_GRCm38.p6_genomic.gtf ...            ||
||    Features : 1389889                                                      ||
||    Meta-features : 42454                                                   ||
||    Chromosomes/contigs : 209                                               ||
||                                                                            ||
|| Process SAM file aged12h2002.fastq_1_40.trimmed_genome_-l_40.sam-genes...  ||
||    Single-end reads are included.                                          ||
||    Assign alignments to features...                                        ||
||    Total alignments : 1402942193                                           ||
||    Successfully assigned alignments : 12806256 (0.9%)                      ||
||    Running time : 50.75 minutes                                            ||
||                                                                            ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

This is an extremely low rate, and I'm not sure what I'm doing wrong (this is my first time analyzing rna-seq data). The gtf file came from the same directory where my reference genome was found. Could anyone have an idea of what I'm doing wrong?

Thank you!

R RNA-Seq • 1.3k views

ADD COMMENT • link updated 4.2 years ago by GenoMax 147k • written 4.2 years ago by 1111111 • 0

1

Entering edit mode

Top suspect in this instance is to check if your chromosome names match in all locations e.g. reference, alignment files, annotations).

ADD REPLY • link 4.2 years ago by GenoMax 147k

0

Entering edit mode

This might be unlikely, but what feature does featureCounts() quantify as a default? Transcripts or genes?

If you are counting transcripts it might be because of the ambiguity (although the quantification seems low)

ADD REPLY • link updated 4.2 years ago by Ram 44k • written 4.2 years ago by Rogerio Ribeiro ▴ 110