Hi,
I am very new to RNAseq analysis. I got Illumina paired-end RNA-Seq data. After QC, the data was aligned to the genome with a gff3 annotation file using STAR (Uniquely mapped reads 80%), then I use featureCounts (version 2.0.1) in conda env to count genes.
The parameters for running featureCounts are listed following:
featureCounts \
analysis/aligned_sequences/SRR1171897/Aligned.sortedByCoord.out.bam \
-a data/annotation/Cs_genes_v2_annot.gff3 \
-o analysis/final_counts/SRR1171897/featureCounts.txt \
-T 10 \
-p \
-F "GFF3" \
-g "Parent"
I am not sure about the -g "Parent" based on the gff3 annotation file, the first couple lines were showed below:
Chr1 AAFC_NRC gene 1 6504 . - . ID=Csa01g001000;Name=Csa01g001000;Note=methyl-CPG-binding domain 9
Chr1 AAFC_NRC gene 1 6504 . - . ID=Csa01g001000;Name=Csa01g001000
Chr1 AAFC_NRC mRNA 1 6504 . - . ID=Csa01g001000.1;Name=Csa01g001000.1;Parent=Csa01g001000;Note=methyl-CPG-binding domain 9
Chr1 AAFC_NRC five_prime_UTR 6380 6504 . - . ID=Csa01g001000.1.utr5p1;Parent=Csa01g001000.1
Chr1 AAFC_NRC exon 5865 6504 . - . ID=Csa01g001000.1.exon1;Parent=Csa01g001000.1
After finishing the featureCounts, I got the following results:
My question is which gene identifier should I use for the -g parameter when running featureCounts, and why I only got 51.1% successfully assigned alignments? Is my result correct and is there anything I could do to improve this?
Thank you very much.
In my experience, setting also the arguments
-M
(including multi-mapping reads) slightly increased the % of assigned alignments. You can also try to set-t gene
(-t exon
by default) and see if you notice any changes. However, getting a % of assigned alignments ~50-60% does not necessarily mean that the annotation has been unsuccessful, but it might be that many of your reads come from regions not annotated (like some non-coding regions).I would also suggest you try different .gtf files, as it might also affect the results. Not sure from what source you got yours, but I usually stick to the ones from Ensembl.
Hello Marco,
Thank you for the suggestion. Sounds good, I will play around with the -M and -t parameter.
The .gff3 file come from our own lab, I will ask around to see if there is a different version.
Thanks, Liyong