Hello,
I am having some trouble interpreting the output of featureCounts. I am interested in completing differential gene expression analysis.
This is my input code:
results=/blue/RNAseq2022/results
DATA=/blue/RNAseq2022/results/HISAT2_Alignment_Index1
GTF=/blue/RNAseq2022/results/GTF
OUTPUT=/blue/RNAseq2022/results/FeatureCounts
cd ${results}
ml subread
prefix=$(basename ${1} ".bam")
featureCounts -p -s 0 -a ${GTF}/Bos_taurus.ARS-UCD1.2.109.gtf.gz -o ${OUTPUT}/${prefix}Counts.txt ${DATA}/${prefix}.bam
All samples were run in parallel (additional run file not included) and submitted as SLURM job.
Output for one of the samples:
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
v2.0.3
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| ||
|| 210177-1.bam ||
|| ||
|| Output file : 210177-1Counts.txt ||
|| Summary : 210177-1Counts.txt.summary ||
|| Paired-end : yes ||
|| Count read pairs : no ||
|| Annotation : Bos_taurus.ARS-UCD1.2.109.gtf.gz (GTF) ||
|| Dir for temp files : /blue/RNAseq2022/ ... ||
|| ||
|| Threads : 1 ||
|| Level : meta-feature level ||
|| Multimapping reads : not counted ||
|| Multi-overlapping reads : not counted ||
|| Min overlapping bases : 1 ||
|| ||
\\============================================================================//
//================================= Running ==================================\\
|| ||
|| Load annotation file Bos_taurus.ARS-UCD1.2.109.gtf.gz ... ||
|| Features : 433820 ||
|| Meta-features : 27607 ||
|| Chromosomes/contigs : 178 ||
|| ||
|| Process BAM file 210177-1.bam... ||
|| Paired-end reads are included. ||
|| The reads are assigned on the single-end mode. ||
|| Total alignments : 43802718 ||
|| Successfully assigned alignments : 15799023 (36.1%) ||
|| Running time : 1.30 minutes ||
|| ||
|| Write the final count table. ||
|| Write the read assignment summary. ||
|| ||
|| Summary of counting results can be found in file "/blue/RNAseq2022/... ||
|| RNAseq2022/results/FeatureCounts/210177-1Counts.txt.summary" ||
|| ||
\\============================================================================//
What I am most confused on is why it states "The reads are assigned on the single-end mode." when my data is paired-end, were aligned as paired, and I specified paired in the arguments for featureCounts.
Also, my overall alignment using Hisat2 was ~94% across all samples. This includes multi-aligned reads etc., but I believe mapped 1 was ~80% or so. I know here I am only counting alignment to meta-features (genes) so I am guessing this is why the "Successfully assigned alignments" drops drastically, but is a ~30-40% average typical? Should I be including multi-mapped and multi-overlapping reads? This is my first time doing bulk RNA-seq analysis.
Thank you in advance to anyone who can provide guidance with this!
Hi there, I also encounter with the same problem recently. I am so confused what is 'Successfully assigned alignments', and what's the different between this alignment result with the one I run out in hisat2.
I have already add
--countReadPairs
in my command. but still get only 73.2% alignments while my hisat2 alignments is up to 99.88% as below:I cant tell their difference. heeeeeeeeelp plz.
This is a professional forum. Please don't use childlike language here.
Also, don't use nohup, it's an old hack that is no longer the way to run processes that should not be interrupted if user accidentally gets disconnected from the process. Use
screen
ortmux
instead.featureCounts
is assigning aligned read to create counts. It omits multi-mapping reads since they do not provide specific information. So whilehisat2
is aligning the data at a higher percentage only part of those alignments can be used for generating counts.featureCounts
is not doing alignment, it is only figuring out which reads fall under which gene models to generate the counts.