Question

salmon was only able to assign 0 fragments to transcripts

1

Entering edit mode

5.2 years ago

naseerkhan861 ▴ 10

I am following This tutorial to test the working of salmon on my RNASeq dataset. I am using the RNA-Seq dataset from SRA, for testing I am just using the first run that is "SRR5938419"(about 4.85 GB), so after downloading this file , I used SRAToolkit and using the following ommand, it gave me two fasta files of size(1.45 GB each)

E:\SRAs>fastq-dump --split-files --fasta  60 --gzip E:\SRAs\SRR5938419
Read 36388169 spots for /E/SRAs/SRR5938419
Written 36388169 spots for /E/SRAs/SRR5938419

Then I downloaded the salmon docker as I am in my windows 10 environment using the following command

docker pull combinelab/salmon

So after running the docker for the downloaded image I was able to test if the salmon was working or not so I ran the following commands

root@d6fc32919494:/home# ls
SRAs  salmon-0.14.1  salmon-v0.14.1.tar.gz
root@d6fc32919494:/home# salmon
salmon v0.14.1

Usage:  salmon -h|--help or
        salmon -v|--version or
        salmon -c|--cite or
        salmon [--no-version-check] <COMMAND> [-h | options]

Commands:
     index Create a salmon index
     quant Quantify a sample
     alevin single cell analysis
     swim  Perform super-secret operation
     quantmerge Merge multiple quantifications into a single file
root@d6fc32919494:/home#

then I shared the folder in for my windows host so that I could access fasta files in my docker salmon environment. After that I downloaded the human reference transcriptome using This link and executed the following commands in my docker to save the

root@d6fc32919494:/home# wget ftp://ftp.ensembl.org/pub/release-89/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz -o human.fa.gz
root@d6fc32919494:/home# ls
Homo_sapiens.GRCh38.cdna.all.fa.gz  SRAs  human.fa.gz  salmon-0.14.1  salmon-v0.14.1.tar.gz
root@d6fc32919494:/home#

Then I build the index using the following command.

root@d6fc32919494:/home# salmon index -t human.fa.gz -i human_index
Version Info: Could not resolve upgrade information in the alotted time.
Check for upgrades manually at https://combine-lab.github.io/salmon
index ["human_index"] did not previously exist  . . . creating it
[2019-10-24 09:47:41.438] [jLog] [info] building index
[2019-10-24 09:47:41.449] [jointLog] [info] [Step 1 of 4] : counting k-mers
Elapsed time: 0.0048599s

[2019-10-24 09:47:41.454] [jointLog] [info] Replaced 97776 non-ATCG nucleotides
[2019-10-24 09:47:41.454] [jointLog] [info] Clipped poly-A tails from 0 transcripts
[2019-10-24 09:47:41.454] [jointLog] [info] Building rank-select dictionary and saving to disk
[2019-10-24 09:47:41.454] [jointLog] [info] done
Elapsed time: 3.77e-05s
[2019-10-24 09:47:41.454] [jointLog] [info] Writing sequence data to file . . .
[2019-10-24 09:47:41.454] [jointLog] [info] done
Elapsed time: 6.93e-05s
[2019-10-24 09:47:41.454] [jointLog] [info] Building 32-bit suffix array (length of generalized text is 97845)
[2019-10-24 09:47:41.454] [jointLog] [info] Building suffix array . . .
success
saving to disk . . . done
Elapsed time: 0.0001873s
done
Elapsed time: 0.0101487s
processed 0 positions[2019-10-24 09:47:41.490] [jointLog] [info] khash had 97814 keys
[2019-10-24 09:47:41.490] [jointLog] [info] saving hash to disk . . .
[2019-10-24 09:47:41.496] [jointLog] [info] done
Elapsed time: 0.0052422s
[2019-10-24 09:47:41.496] [jLog] [info] done building index
root@d6fc32919494:/home# ls
Homo_sapiens.GRCh38.cdna.all.fa.gz  SRAs  human.fa.gz  human_index  salmon-0.14.1  salmon-v0.14.1.tar.gz
root@d6fc32919494:/home#

After that I ran the salmon with the created human_index on my fasta files using the following command

root@d6fc32919494:/home# salmon quant -i human_index -l A -1 SRAs/SRR5938419_1.fasta.gz  -2 SRAs/SRR5938419_2.fasta.gz -p 8 --validateMappings -o quants/SRR5938419_qunats
Version Info: Could not resolve upgrade information in the alotted time.
Check for upgrades manually at https://combine-lab.github.io/salmon
### salmon (mapping-based) v0.14.1
### [ program ] => salmon
### [ command ] => quant
### [ index ] => { human_index }
### [ libType ] => { A }
### [ mates1 ] => { SRAs/SRR5938419_1.fasta.gz }
### [ mates2 ] => { SRAs/SRR5938419_2.fasta.gz }
### [ threads ] => { 8 }
### [ validateMappings ] => { }
### [ output ] => { quants/SRR5938419_qunats }
Logs will be written to quants/SRR5938419_qunats/logs
[2019-10-24 09:51:04.203] [jointLog] [info] Fragment incompatibility prior below threshold.  Incompatible fragments will be ignored.
[2019-10-24 09:51:04.203] [jointLog] [info] Usage of --validateMappings implies use of minScoreFraction. Since not explicitly specified, it is being set to 0.65
[2019-10-24 09:51:04.203] [jointLog] [info] Usage of --validateMappings, without --hardFilter implies use of range factorization. rangeFactorizationBins is being set to 4
[2019-10-24 09:51:04.203] [jointLog] [info] Usage of --validateMappings implies a default consensus slack of 0.2. Setting consensusSlack to 0.2.
[2019-10-24 09:51:04.203] [jointLog] [info] parsing read library format
[2019-10-24 09:51:04.203] [jointLog] [info] There is 1 library.
[2019-10-24 09:51:04.237] [jointLog] [info] Loading Quasi index
[2019-10-24 09:51:04.237] [jointLog] [info] Loading 32-bit quasi index
[2019-10-24 09:51:04.242] [jointLog] [info] done
[2019-10-24 09:51:04.242] [jointLog] [info] Index contained 1 targets
[2019-10-24 09:51:04.237] [stderrLog] [info] Loading Suffix Array
[2019-10-24 09:51:04.237] [stderrLog] [info] Loading Transcript Info
[2019-10-24 09:51:04.237] [stderrLog] [info] Loading Rank-Select Bit Array
[2019-10-24 09:51:04.237] [stderrLog] [info] There were 1 set bits in the bit array
[2019-10-24 09:51:04.237] [stderrLog] [info] Computing transcript lengths
[2019-10-24 09:51:04.237] [stderrLog] [info] Waiting to finish loading hash
[2019-10-24 09:51:04.242] [stderrLog] [info] Done loading index

I got the following message after few minutes

processed 36000000 fragments
hits: 0, hits per frag:  0[2019-10-24 09:55:35.561] [jointLog] [warning] salmon was only able to assign 0 fragments to transcripts in the index, but the minimum number of required assigned fragments (--minAssignedFrags) was 10. This could be indicative of a mismatch between the reference and sample, or a very bad sample.  You can change the --minAssignedFrags parameter to force salmon to quantify with fewer assigned fragments (must have at least 1).
root@d6fc32919494:/home#

So I did not find and TSV file as was claimed in the tutorial link as I mentioned in the first line of post , instead I got following information in the file

root@d6fc32919494:/home# ls
Homo_sapiens.GRCh38.cdna.all.fa.gz  SRAs  human.fa.gz  human_index  quants  salmon-0.14.1  salmon-v0.14.1.tar.gz
root@d6fc32919494:/home# cd quants
root@d6fc32919494:/home/quants# ls
SRR5938419_qunats
root@d6fc32919494:/home/quants# cd SRR5938419_qunats/
root@d6fc32919494:/home/quants/SRR5938419_qunats# ls
aux_info  cmd_info.json  libParams  logs  quant.sf
root@d6fc32919494:/home/quants/SRR5938419_qunats# cat quant.sf
Name    Length  EffectiveLength TPM     NumReads
        97844   97844.000       0.000000        0.000
root@d6fc32919494:/home/quants/SRR5938419_qunats#

Now after all this effort, please somebody tell me what is the problem, why I am not getting the TPMs or counts and what went wrong, I would be extremely humbled if somebody could guide me the way out of this problem.

Regards

RNA-Seq Salmon windows10 docker • 4.5k views

ADD COMMENT • link updated 5.2 years ago by Jianyu ▴ 580 • written 5.2 years ago by naseerkhan861 ▴ 10

0

Entering edit mode

It may not solve your question, but it will make all bioinformatics a lot easier if you can work on a Linux machine. WSL is pretty good, but as far as I know it doesn't perfectly substitute a real Linux environment.

ADD REPLY • link 5.2 years ago by WouterDeCoster 47k

0

Entering edit mode

I will check WSL and will run all those commands there, but you can see, I was not having any issues related to downloading or installation or running of some package

ADD REPLY • link 5.2 years ago by naseerkhan861 ▴ 10

ATpoint · Accepted Answer · 2019-10-24

4

Entering edit mode

5.2 years ago

Jianyu ▴ 580

In wget command, -o is used to specify the file to record the log messages, so you should use Homo_sapiens.GRCh38.cdna.all.fa.gz to index not human.fa.gz, human.fa.gz only contains log messages instead of fasta sequence.

ADD COMMENT • link updated 5.2 years ago by ATpoint 86k • written 5.2 years ago by Jianyu ▴ 580

1

Entering edit mode

Building a suffix array for the human transcriptome in 0.0001873s and the whole index within 0.0052422s would make me suspicious even on a dedicated workstation.

ADD REPLY • link 5.2 years ago by michael.ante ★ 3.9k

1

Entering edit mode

Agreed. You are simply using the wrong reference file. Index the actual Homo_sapiens.GRCh38.cdna.all.fa.gz and everything should be fine.

ADD REPLY • link 5.2 years ago by ATpoint 86k

0

Entering edit mode

Thanks as you are always very helpful.

ADD REPLY • link 5.2 years ago by naseerkhan861 ▴ 10

0

Entering edit mode

Thank you very much Indeed!

ADD REPLY • link 5.2 years ago by naseerkhan861 ▴ 10