Hi!
I am having issue with salmon
index formation since I cannot use STAR
due to limited amount of RAM (as per my recent post). I tried to follow this tutorial on how to create decoy-aware transcriptome as well as doing directly this and I have few question.
1) how much memory is required to build index and do the alignment with salmon?
2) is it necessary to build a decoy-aware transcriptome file? because according to the manual it is recommended.
Now salmon seems to be frozen (it has been > 2 hrs trying to generate decoy-aware transcriptome for human. is this time normal? The code is really simple and follows the tutorial:
mkdir test
cd test
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/GRCh38.primary_assembly.genome.fa.gz
wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/gencode.v44.transcripts.fa.gz
grep "^>" <(gunzip -c GRCh38.primary_assembly.genome.fa.gz) | cut -d " " -f 1 > decoys.txt
sed -i.bak -e 's/>//g' decoys.txt
cat gencode.v44.transcripts.fa.gz GRCh38.primary_assembly.genome.fa.gz > gentrome.fa.gz
salmon index -t gentrome.fa.gz -d decoys.txt -p 12 -i human_index --gencode
but what I get is .
is this normal? how long does it take to be human index usually? is there anything I can do? my computer is a Mac M1 16G RAM.
I also tried downloading the transcriptome and then do it salmon index -t human.fa.gz -i human_index
and it is still stuck at the same level.
if I do top
both code are still running
Thank you very much
Camilla
It is probably normal considering resources you have.
While
salmon
devs recommend usingdecoy aware
indexes that does increase the RAM requirements to run salmon to ~18-20GB for human sized genome (just outside what you have based on info from another thread). I think you should stick with plain indexes (no decoy) if you have < 16 GB of RAM. Then I think you need about 4-6 GB RAM.You can also download precreated
salmon
indexes from RefGenie project here: http://refgenomes.databio.org/Just wait until it is finished. It's seems to be running. I just checked my logs and on our HPC with four cores such as process for a decoyed human transcriptome takes about 30min with somewhere in the 16GB peak memory range (building a sparse index, runs longer, but less memory). Is this a Macbook Air or any model without active cooling? If so it probably throttles down due to excessive heat, since it's working hard.
Thank you. Waiting it's not gonna hurt me, when should I stop hoping and give up is the question? do you know is there a way if I can see if it is actually doing something (really slow)? (it's an iMac)
If the top $CPU column shows some values in the hundreds then it is doing something (actually when it's > 0%). So based on the screenshot it does. I see two salmon processes in the top, why is that?
I opened tow terminals and one was the code to build a decoy-aware transcriptome and the other was building indexes (no decoy)
I think you should kill one of the processes, possibly the decoy one. Also keep an eye on the CPU temperature and the kernel_task. If CPU usage of kernel_task increases, this may indicate an overheating of the CPU. If possible, I'd try to download a pre-built index.
Are you a student or employee at a university or research institute? In this case, it is worth checking with your IT department if there is a computing infrastructure available. There could be servers and infrastructure available to you that are much more powerful than a laptop. You just have to contact the right person. It is also worth contacting your national Elixir node, which would be Elixir UK.