Question

Bulk RNAseq Salmon index building which transcriptome to use

0

Entering edit mode

13 months ago

Orange ▴ 30

Hi all, I am new to the platform.

I was wondering what the common/best practice is regarding building a Salmon index for bulk RNAseq analysis of human cells.

The tutorial for Salmon/Alevin is using the complete transcriptome from GENCODE (gencode.vM23.transcripts.fa.gz, can be seen from the lack of "pc_" in the file name) which presumably contains rRNA sequences.

However, other tutorials use a transcriptome from Ensembl which seems to contain only the cDNA (Homo_sapiens.GRCh38.cdna.all.fa.gz).

My research question is rather simple and looking for DEGs between treatments so I am not sure if I will be focusing on differences in ncRNAs. Also, "overrepresented sequences" and "per base GC content" from FastQC is consistent with a small rRNA contamination (I am currently running Salmon to see if the rRNA levels are similar across all samples).

I wonder which of the following is the common/best practice or if there is a better approach:

Build full decoy aware index using full transcriptome (as in Salmon/Alevin tutorial)
Build full decoy aware index using Protein-coding transcript sequences (GENCODE) or cDNA (Ensembl)
Build full decoy aware index using full transcriptome, except removing rRNA sequences from the transcriptome and moving them to decoy sequences
Similar to #1, except preceded by rRNA removal with BBDuk or SortMeRNA

Thanks in advance for your help!

Salmon RNA-seq • 1.0k views

ADD COMMENT • link 13 months ago by Orange ▴ 30

0

Entering edit mode

Generally speaking you'll want to provide the full transcriptome plus the genome decoy as per your point 1.

ADD REPLY • link 13 months ago by rpolicastro 13k

1

Entering edit mode

Thanks for your reply! I proceeded with #1. After quantification, I looked at the reads that mapped to rRNA, rRNA-pseudogene, MT-rRNA and they were all minimal (<0.5% of total TPM of 1e6). Although there were slight differences amongst treatments, I don't think this will be an issue.

ADD REPLY • link 13 months ago by Orange ▴ 30