I am analyzing RNAseq data and plan to use STAR for alignment. I heard that for WGS, reference genome with decoy is suggested, as mentioned in the GDC pipeline. However, for mRNA expression analysis, GDC did not explicitly mention the use of decoy. I wonder, theoretically, should we add decoy into the reference genome for RNAseq data processing.
The question seems related to a previous discussion. However, I am looking for more general advice on whether including virus decoy is recommended for samples that we do not expect to find active virus.
I am interested to know whether the rule also applies to Salmon or similar tools that do Quasi-mapping. Thanks!
For mRNA analysis, I don't think it's needed unless you have a specific reason to add a particular genome.
FastQ Screen is a QC tool that can give you a sense on if there is other species contaminating your reads.
This can be helpful because if you do have a significant portion of reads coming from other species, then you may want to expose these reads, or use a decoy, to prevent artifacts.
You can add specific genomes of interest as well.
GDC uses a single reference genome, that means the decoys are also used during RNA-Seq analysis. I personally don't feel they are quite needed for RNA-Seq quantifications, but they might help if you are doing RNA-Seq variant calling.