Question

Very low successfully assigned alignments with feature counts

1

Entering edit mode

11 months ago

Manuel ▴ 10

Hello everyone,

I am stuck trying to analyze some single-end RNAseq data from human tissue. My issue is that the alignment with HISAT 2 went very well: 94.95% overall alignment rate.

However, when I use featureCounts, I get:

5.7% when I set the strandSpecific parameter to 1.
5.3% when I set the strandSpecific parameter to 2 (I guess this would be the correct one)
18.8% when I set the strandSpecific parameter to 0

Since this is the first time I'm analyzing sequencing data, I was wondering if someone could guide me on what might be happening or if this is normal.

Thank you very much in advance.

I include more information about the kits and commands used:

The kits that have been used:

For library preparation: Illumina® Stranded Total RNA Prep, Ligation with Ribo-Zero Plus (16 Samples) (REF: 20040525)
For sequencing: NovaSeq 6000 SP Reagent Kit v1.5 (100 cycles) (REF: 20028401)

In our case, the sequencing was single (R1).

For the alignment, I use HISAT2 with the following command:

hisat2 -q \
  --rna-strandness F \
  -x /home/manu/research/genome/grch38/genome \
  -U /home/manu/research/analisis/data/ITK2-004-S1.fastq.gz | \
    samtools sort -o /home/manu/research/analisis/align/demo_trimmed.bam

99524981 reads; of these:
  99524981 (100.00%) were unpaired; of these:
    5030213 (5.05%) aligned 0 times
    58319656 (58.60%) aligned exactly 1 time
    36175112 (36.35%) aligned >1 times
94.95% overall alignment rate

Later I use featureCounts to count the reads:

counts <- featureCounts(files = "demo_trimmed.bam",
                        annot.ext = "Homo_sapiens.GRCh38.110.gtf",
                        isPairedEnd = FALSE,
                        strandSpecific = 1,
                        isGTFAnnotationFile = TRUE,
                        GTF.featureType = "exon",
                        GTF.attrType = "gene_id")

==========     _____ _    _ ____  _____  ______          _____ 
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
       Rsubread 2.14.2

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                                                                            ||
||                           demo_trimmed.bam                                 ||
||                                                                            ||
||              Paired-end : no                                               ||
||        Count read pairs : no                                               ||
||              Annotation : Homo_sapiens.GRCh38.110.gtf (GTF)                ||
||      Dir for temp files : .                                                ||
||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||      Multimapping reads : counted                                          ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file Homo_sapiens.GRCh38.110.gtf ...                       ||
||    Features : 1649677                                                      ||
||    Meta-features : 62754                                                   ||
||    Chromosomes/contigs : 47                                                ||
||                                                                            ||
|| Process BAM file demo_trimmed.bam...                                       ||
||    Strand specific : stranded                                              ||
||    Single-end reads are included.                                          ||
||    Total alignments : 225796353                                            ||
||    Successfully assigned alignments : 12844679 (5.7%)                      ||
||    Running time : 3.32 minutes                                             ||
||                                                                            ||
|| Write the final count table.                                               ||
|| Write the read assignment summary.                                         ||
||                                                                            ||
\\============================================================================//



\\============================================================================//

counts <- featureCounts(files = "demo_trimmed.bam",
                        annot.ext = "Homo_sapiens.GRCh38.110.gtf",
                        isPairedEnd = FALSE,
                        strandSpecific = 2,
                        isGTFAnnotationFile = TRUE)

==========     _____ _    _ ____  _____  ______          _____ 
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
       Rsubread 2.14.2

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                                                                            ||
||                           demo_trimmed.bam                                 ||
||                                                                            ||
||              Paired-end : no                                               ||
||        Count read pairs : no                                               ||
||              Annotation : Homo_sapiens.GRCh38.110.gtf (GTF)                ||
||      Dir for temp files : .                                                ||
||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||      Multimapping reads : counted                                          ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file Homo_sapiens.GRCh38.110.gtf ...                       ||
||    Features : 1649677                                                      ||
||    Meta-features : 62754                                                   ||
||    Chromosomes/contigs : 47                                                ||
||                                                                            ||
|| Process BAM file demo_trimmed.bam...                                       ||
||    Strand specific : reversely stranded                                    ||
||    Single-end reads are included.                                          ||
||    Total alignments : 225796353                                            ||
||    Successfully assigned alignments : 34526662 (15.3%)                     ||
||    Running time : 3.47 minutes                                             ||
||                                                                            ||
|| Write the final count table.                                               ||
|| Write the read assignment summary.                                         ||
||                                                                            ||
\\============================================================================//

counts <- featureCounts(files = "demo_trimmed.bam",
                        annot.ext = "Homo_sapiens.GRCh38.110.gtf",
                        isPairedEnd = FALSE,
                        strandSpecific = 0,
                        isGTFAnnotationFile = TRUE)

==========     _____ _    _ ____  _____  ______          _____ 
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
       Rsubread 2.14.2

//========================== featureCounts setting ===========================\\
||                                                                            ||
||             Input files : 1 BAM file                                       ||
||                                                                            ||
||                           demo_trimmed.bam                                 ||
||                                                                            ||
||              Paired-end : no                                               ||
||        Count read pairs : no                                               ||
||              Annotation : Homo_sapiens.GRCh38.110.gtf (GTF)                ||
||      Dir for temp files : .                                                ||
||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||      Multimapping reads : counted                                          ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
\\============================================================================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file Homo_sapiens.GRCh38.110.gtf ...                       ||
||    Features : 1649677                                                      ||
||    Meta-features : 62754                                                   ||
||    Chromosomes/contigs : 47                                                ||
||                                                                            ||
|| Process BAM file demo_trimmed.bam...                                       ||
||    Single-end reads are included.                                          ||
||    Total alignments : 225796353                                            ||
||    Successfully assigned alignments : 42545035 (18.8%)                     ||
||    Running time : 3.36 minutes                                             ||
||                                                                            ||
|| Write the final count table.                                               ||
|| Write the read assignment summary.                                         ||
||                                                                            ||

[\\============================================================================//]\\============================================================================//

featureCounts RNA-seq • 804 views

ADD COMMENT • link updated 11 months ago by Istvan Albert 101k • written 11 months ago by Manuel ▴ 10

0

Entering edit mode

Is the genome/annotation from the same source i.e. Ensembl or so it appears?

ADD REPLY • link 11 months ago by GenoMax 147k

0

Entering edit mode

Load up your data in IGV and visualize it relative to your annotation file. It will tell you where the data aligns.

5% match on a 99% alignment is not the expected outcome

it could be that the annotations don't match the genome - that would be my first guess; make sure the chromosome names match, too

(that being said IGV will convert the chr1 to 1 but featureCounts won't)

ADD REPLY • link 11 months ago by Istvan Albert 101k