I am following This tutorial to test the working of salmon on my RNASeq dataset. I am using the RNA-Seq dataset from SRA, for testing I am just using the first run that is "SRR5938419"(about 4.85 GB), so after downloading this file , I used SRAToolkit and using the following ommand, it gave me two fasta files of size(1.45 GB each)
E:\SRAs>fastq-dump --split-files --fasta 60 --gzip E:\SRAs\SRR5938419
Read 36388169 spots for /E/SRAs/SRR5938419
Written 36388169 spots for /E/SRAs/SRR5938419
Then I downloaded the salmon docker as I am in my windows 10 environment using the following command
docker pull combinelab/salmon
So after running the docker for the downloaded image I was able to test if the salmon was working or not so I ran the following commands
root@d6fc32919494:/home# ls
SRAs salmon-0.14.1 salmon-v0.14.1.tar.gz
root@d6fc32919494:/home# salmon
salmon v0.14.1
Usage: salmon -h|--help or
salmon -v|--version or
salmon -c|--cite or
salmon [--no-version-check] <COMMAND> [-h | options]
Commands:
index Create a salmon index
quant Quantify a sample
alevin single cell analysis
swim Perform super-secret operation
quantmerge Merge multiple quantifications into a single file
root@d6fc32919494:/home#
then I shared the folder in for my windows host so that I could access fasta files in my docker salmon environment. After that I downloaded the human reference transcriptome using This link and executed the following commands in my docker to save the
root@d6fc32919494:/home# wget ftp://ftp.ensembl.org/pub/release-89/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz -o human.fa.gz
root@d6fc32919494:/home# ls
Homo_sapiens.GRCh38.cdna.all.fa.gz SRAs human.fa.gz salmon-0.14.1 salmon-v0.14.1.tar.gz
root@d6fc32919494:/home#
Then I build the index using the following command.
root@d6fc32919494:/home# salmon index -t human.fa.gz -i human_index
Version Info: Could not resolve upgrade information in the alotted time.
Check for upgrades manually at https://combine-lab.github.io/salmon
index ["human_index"] did not previously exist . . . creating it
[2019-10-24 09:47:41.438] [jLog] [info] building index
[2019-10-24 09:47:41.449] [jointLog] [info] [Step 1 of 4] : counting k-mers
Elapsed time: 0.0048599s
[2019-10-24 09:47:41.454] [jointLog] [info] Replaced 97776 non-ATCG nucleotides
[2019-10-24 09:47:41.454] [jointLog] [info] Clipped poly-A tails from 0 transcripts
[2019-10-24 09:47:41.454] [jointLog] [info] Building rank-select dictionary and saving to disk
[2019-10-24 09:47:41.454] [jointLog] [info] done
Elapsed time: 3.77e-05s
[2019-10-24 09:47:41.454] [jointLog] [info] Writing sequence data to file . . .
[2019-10-24 09:47:41.454] [jointLog] [info] done
Elapsed time: 6.93e-05s
[2019-10-24 09:47:41.454] [jointLog] [info] Building 32-bit suffix array (length of generalized text is 97845)
[2019-10-24 09:47:41.454] [jointLog] [info] Building suffix array . . .
success
saving to disk . . . done
Elapsed time: 0.0001873s
done
Elapsed time: 0.0101487s
processed 0 positions[2019-10-24 09:47:41.490] [jointLog] [info] khash had 97814 keys
[2019-10-24 09:47:41.490] [jointLog] [info] saving hash to disk . . .
[2019-10-24 09:47:41.496] [jointLog] [info] done
Elapsed time: 0.0052422s
[2019-10-24 09:47:41.496] [jLog] [info] done building index
root@d6fc32919494:/home# ls
Homo_sapiens.GRCh38.cdna.all.fa.gz SRAs human.fa.gz human_index salmon-0.14.1 salmon-v0.14.1.tar.gz
root@d6fc32919494:/home#
After that I ran the salmon with the created human_index on my fasta files using the following command
root@d6fc32919494:/home# salmon quant -i human_index -l A -1 SRAs/SRR5938419_1.fasta.gz -2 SRAs/SRR5938419_2.fasta.gz -p 8 --validateMappings -o quants/SRR5938419_qunats
Version Info: Could not resolve upgrade information in the alotted time.
Check for upgrades manually at https://combine-lab.github.io/salmon
### salmon (mapping-based) v0.14.1
### [ program ] => salmon
### [ command ] => quant
### [ index ] => { human_index }
### [ libType ] => { A }
### [ mates1 ] => { SRAs/SRR5938419_1.fasta.gz }
### [ mates2 ] => { SRAs/SRR5938419_2.fasta.gz }
### [ threads ] => { 8 }
### [ validateMappings ] => { }
### [ output ] => { quants/SRR5938419_qunats }
Logs will be written to quants/SRR5938419_qunats/logs
[2019-10-24 09:51:04.203] [jointLog] [info] Fragment incompatibility prior below threshold. Incompatible fragments will be ignored.
[2019-10-24 09:51:04.203] [jointLog] [info] Usage of --validateMappings implies use of minScoreFraction. Since not explicitly specified, it is being set to 0.65
[2019-10-24 09:51:04.203] [jointLog] [info] Usage of --validateMappings, without --hardFilter implies use of range factorization. rangeFactorizationBins is being set to 4
[2019-10-24 09:51:04.203] [jointLog] [info] Usage of --validateMappings implies a default consensus slack of 0.2. Setting consensusSlack to 0.2.
[2019-10-24 09:51:04.203] [jointLog] [info] parsing read library format
[2019-10-24 09:51:04.203] [jointLog] [info] There is 1 library.
[2019-10-24 09:51:04.237] [jointLog] [info] Loading Quasi index
[2019-10-24 09:51:04.237] [jointLog] [info] Loading 32-bit quasi index
[2019-10-24 09:51:04.242] [jointLog] [info] done
[2019-10-24 09:51:04.242] [jointLog] [info] Index contained 1 targets
[2019-10-24 09:51:04.237] [stderrLog] [info] Loading Suffix Array
[2019-10-24 09:51:04.237] [stderrLog] [info] Loading Transcript Info
[2019-10-24 09:51:04.237] [stderrLog] [info] Loading Rank-Select Bit Array
[2019-10-24 09:51:04.237] [stderrLog] [info] There were 1 set bits in the bit array
[2019-10-24 09:51:04.237] [stderrLog] [info] Computing transcript lengths
[2019-10-24 09:51:04.237] [stderrLog] [info] Waiting to finish loading hash
[2019-10-24 09:51:04.242] [stderrLog] [info] Done loading index
I got the following message after few minutes
processed 36000000 fragments
hits: 0, hits per frag: 0[2019-10-24 09:55:35.561] [jointLog] [warning] salmon was only able to assign 0 fragments to transcripts in the index, but the minimum number of required assigned fragments (--minAssignedFrags) was 10. This could be indicative of a mismatch between the reference and sample, or a very bad sample. You can change the --minAssignedFrags parameter to force salmon to quantify with fewer assigned fragments (must have at least 1).
root@d6fc32919494:/home#
So I did not find and TSV file as was claimed in the tutorial link as I mentioned in the first line of post , instead I got following information in the file
root@d6fc32919494:/home# ls
Homo_sapiens.GRCh38.cdna.all.fa.gz SRAs human.fa.gz human_index quants salmon-0.14.1 salmon-v0.14.1.tar.gz
root@d6fc32919494:/home# cd quants
root@d6fc32919494:/home/quants# ls
SRR5938419_qunats
root@d6fc32919494:/home/quants# cd SRR5938419_qunats/
root@d6fc32919494:/home/quants/SRR5938419_qunats# ls
aux_info cmd_info.json libParams logs quant.sf
root@d6fc32919494:/home/quants/SRR5938419_qunats# cat quant.sf
Name Length EffectiveLength TPM NumReads
97844 97844.000 0.000000 0.000
root@d6fc32919494:/home/quants/SRR5938419_qunats#
Now after all this effort, please somebody tell me what is the problem, why I am not getting the TPMs or counts and what went wrong, I would be extremely humbled if somebody could guide me the way out of this problem.
Regards
It may not solve your question, but it will make all bioinformatics a lot easier if you can work on a Linux machine. WSL is pretty good, but as far as I know it doesn't perfectly substitute a real Linux environment.
I will check WSL and will run all those commands there, but you can see, I was not having any issues related to downloading or installation or running of some package