I'm trying to construct the reference for Salmon, but the process stopped with this error:
the commandis:
salmon index -t CRCH38_and_decoys.fa.gz -d decoys.txt -i GRCh38_salmon_index --gencode
the last part of the error is:
[2024-02-19 12:14:55.227] [puff::index::jointLog] [warning] Entry with header [ENST00000634174.1|ENSG00000282732.1|OTTHUMG00000191398.1|OTTHUMT00000487783.1|ENST00000634174|ENSG00000282732|28|unprocessed_pseudogene|], had length less than equal to the k-mer length of 31 (perhaps after poly-A clipping)
[2024-02-19 12:15:56.790] [puff::index::jointLog] [warning] Removed 882 transcripts that were sequence duplicates of indexed transcripts.
[2024-02-19 12:15:56.792] [puff::index::jointLog] [warning] If you wish to retain duplicate transcripts, please use the --keepDuplicates
flag
[2024-02-19 12:15:56.919] [puff::index::jointLog] [info] Replaced 151122967 non-ATCG nucleotides
[2024-02-19 12:15:56.919] [puff::index::jointLog] [info] Clipped poly-A tails from 2034 transcripts
Killed
it could be a problem of CPU memory?
Thanks
How much memory is available?
I have a laptop with 16 Gb
I just checked my logs, and a full genome-decoyed GRCh38 with Ensembl 101 annotations took 15GB to create. Possible that you simply do not have enough, since also other applications need some memory on your machine.
you can download prebuilt indices with
refgenie
https://refgenie.databio.org/en/latest/