Hello everyone,
I am stuck trying to analyze some single-end RNAseq data from human tissue. My issue is that the alignment with HISAT 2 went very well: 94.95% overall alignment rate.
However, when I use featureCounts, I get:
- 5.7% when I set the strandSpecific parameter to 1.
- 5.3% when I set the strandSpecific parameter to 2 (I guess this would be the correct one)
- 18.8% when I set the strandSpecific parameter to 0
Since this is the first time I'm analyzing sequencing data, I was wondering if someone could guide me on what might be happening or if this is normal.
Thank you very much in advance.
I include more information about the kits and commands used:
The kits that have been used:
- For library preparation: Illumina® Stranded Total RNA Prep, Ligation with Ribo-Zero Plus (16 Samples) (REF: 20040525)
- For sequencing: NovaSeq 6000 SP Reagent Kit v1.5 (100 cycles) (REF: 20028401)
In our case, the sequencing was single (R1).
For the alignment, I use HISAT2 with the following command:
hisat2 -q \
--rna-strandness F \
-x /home/manu/research/genome/grch38/genome \
-U /home/manu/research/analisis/data/ITK2-004-S1.fastq.gz | \
samtools sort -o /home/manu/research/analisis/align/demo_trimmed.bam
99524981 reads; of these:
99524981 (100.00%) were unpaired; of these:
5030213 (5.05%) aligned 0 times
58319656 (58.60%) aligned exactly 1 time
36175112 (36.35%) aligned >1 times
94.95% overall alignment rate
Later I use featureCounts to count the reads:
counts <- featureCounts(files = "demo_trimmed.bam",
annot.ext = "Homo_sapiens.GRCh38.110.gtf",
isPairedEnd = FALSE,
strandSpecific = 1,
isGTFAnnotationFile = TRUE,
GTF.featureType = "exon",
GTF.attrType = "gene_id")
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
Rsubread 2.14.2
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| ||
|| demo_trimmed.bam ||
|| ||
|| Paired-end : no ||
|| Count read pairs : no ||
|| Annotation : Homo_sapiens.GRCh38.110.gtf (GTF) ||
|| Dir for temp files : . ||
|| Threads : 1 ||
|| Level : meta-feature level ||
|| Multimapping reads : counted ||
|| Multi-overlapping reads : not counted ||
|| Min overlapping bases : 1 ||
|| ||
\\============================================================================//
//================================= Running ==================================\\
|| ||
|| Load annotation file Homo_sapiens.GRCh38.110.gtf ... ||
|| Features : 1649677 ||
|| Meta-features : 62754 ||
|| Chromosomes/contigs : 47 ||
|| ||
|| Process BAM file demo_trimmed.bam... ||
|| Strand specific : stranded ||
|| Single-end reads are included. ||
|| Total alignments : 225796353 ||
|| Successfully assigned alignments : 12844679 (5.7%) ||
|| Running time : 3.32 minutes ||
|| ||
|| Write the final count table. ||
|| Write the read assignment summary. ||
|| ||
\\============================================================================//
\\============================================================================//
counts <- featureCounts(files = "demo_trimmed.bam",
annot.ext = "Homo_sapiens.GRCh38.110.gtf",
isPairedEnd = FALSE,
strandSpecific = 2,
isGTFAnnotationFile = TRUE)
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
Rsubread 2.14.2
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| ||
|| demo_trimmed.bam ||
|| ||
|| Paired-end : no ||
|| Count read pairs : no ||
|| Annotation : Homo_sapiens.GRCh38.110.gtf (GTF) ||
|| Dir for temp files : . ||
|| Threads : 1 ||
|| Level : meta-feature level ||
|| Multimapping reads : counted ||
|| Multi-overlapping reads : not counted ||
|| Min overlapping bases : 1 ||
|| ||
\\============================================================================//
//================================= Running ==================================\\
|| ||
|| Load annotation file Homo_sapiens.GRCh38.110.gtf ... ||
|| Features : 1649677 ||
|| Meta-features : 62754 ||
|| Chromosomes/contigs : 47 ||
|| ||
|| Process BAM file demo_trimmed.bam... ||
|| Strand specific : reversely stranded ||
|| Single-end reads are included. ||
|| Total alignments : 225796353 ||
|| Successfully assigned alignments : 34526662 (15.3%) ||
|| Running time : 3.47 minutes ||
|| ||
|| Write the final count table. ||
|| Write the read assignment summary. ||
|| ||
\\============================================================================//
counts <- featureCounts(files = "demo_trimmed.bam",
annot.ext = "Homo_sapiens.GRCh38.110.gtf",
isPairedEnd = FALSE,
strandSpecific = 0,
isGTFAnnotationFile = TRUE)
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
Rsubread 2.14.2
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| ||
|| demo_trimmed.bam ||
|| ||
|| Paired-end : no ||
|| Count read pairs : no ||
|| Annotation : Homo_sapiens.GRCh38.110.gtf (GTF) ||
|| Dir for temp files : . ||
|| Threads : 1 ||
|| Level : meta-feature level ||
|| Multimapping reads : counted ||
|| Multi-overlapping reads : not counted ||
|| Min overlapping bases : 1 ||
|| ||
\\============================================================================//
//================================= Running ==================================\\
|| ||
|| Load annotation file Homo_sapiens.GRCh38.110.gtf ... ||
|| Features : 1649677 ||
|| Meta-features : 62754 ||
|| Chromosomes/contigs : 47 ||
|| ||
|| Process BAM file demo_trimmed.bam... ||
|| Single-end reads are included. ||
|| Total alignments : 225796353 ||
|| Successfully assigned alignments : 42545035 (18.8%) ||
|| Running time : 3.36 minutes ||
|| ||
|| Write the final count table. ||
|| Write the read assignment summary. ||
|| ||
[\\============================================================================//]\\============================================================================//
Is the genome/annotation from the same source i.e. Ensembl or so it appears?
Load up your data in IGV and visualize it relative to your annotation file. It will tell you where the data aligns.
5% match on a 99% alignment is not the expected outcome
it could be that the annotations don't match the genome - that would be my first guess; make sure the chromosome names match, too
(that being said IGV will convert the chr1 to 1 but featureCounts won't)