Hi all!
I've relatively new to RNAseq, so I would really appreciate it if everything can be written as simply as possible. The aim of this post is to clarify a few things with building an index with Salmon, just to ensure that I've done everything correctly. The main gist has to dealing with building an index before quantifying with Salmon. I would be trying to quantify PE total-RNA (.fastq) obtained from Illuminia sequencing experiments downstream.
The reference genome I'm using is that of Arabidopsis thaliana, obtained from Ensemble @ the following link:
This is what the code that I've used on a linux-based computer:
./salmon index -t Arabidopsis_thaliana.TAIR10.cdna.all.fa.gz -i aThalianaindex -k 31
I've no issues running the code, but the more I dig online, the more I question the validity of my codes. Could someone clarify the following for me? Or point me towards the right resources, so that I can read up more! (Ps. I've tried reading the salmon documentation but to no avail).
1) Is the cDNA file I've used for building the index the right transcriptomic file to use? If not, which would be the right file. (Hopefully available through Ensemble).
2) I've seen someone merging the cDNA file (presumably the same one that I've used above) and a ncRNA file, and using that merged file to build the index. Why was this done (merging the ncRNA file), should I follow this method instead?
3) I've read on Salmon's own documentation that there are 2 methods of building a 'decoy-aware transcriptome'. Do I have to follow this method strictly? Would my results be significantly affected if I use my method for building an index mentioned above?
4) I've read on Salmon's own documentation that there are 2 methods of building a 'decoy-aware transcriptome'. Referring specifically to the first method using MashMap2, how I do use the script provided on salmon's webpage? (Am actually totally lost on how I should build an index with this method, any guidance from start to end will be helpful)
Any help would be deeply appreciated! Yall can just answer part of the question that you're familiar with. Thanks for your help in advance!
Hey! Thanks for the detailed reply and for clarifying all my concerns (even to the extent of writing the script usage command). Really appreciate your help! :)
Hi Jared,
I am new to RNA-seq but am going to carry out an analysis on some human cancer samples. I noticed you mentioned here that there are prepared decoy-aware indices already for human. I was wondering if you could please direct me to these? I've looked on gencode to find references but I can't seem to find ones that specify they are decoy-aware.
Apologies if this is a silly question. I would be very grateful for any help. Thanks in advance, Alex
I would just make the index yourself based on the reference you want to use. It is simple, use https://combine-lab.github.io/alevin-tutorial/2019/selective-alignment/
alex-bain Pre-made decoy containing
salmon
indexes for human genome are available fromRefgenie
site here. You will need to installrefgenie
application following directions here to download them.