I have a dataset of human samples. I have tested both quantification with salmon
as well as standard mapping with star
.
Looking at the results, the salmon
quantification is a lot worse than star
. In average I get ~90% mapping with star
, but only ~60% with salmon
. But the most interesting thing is the last sample. Here I get only ~3% of the reads aligned (see table below)
#sample #reads STAR_uniq_mapped STAR_unmapped STAR_%total_mapped salmon
1 104736525 90449978 5532361.28 94.72 66.36%
2 85549527 74153554 4745912.55 94.45 71.43%
3 82846983 68374449 5751113.94 93.06 70.27%
4 93190747 79607891 4592829.92 95.07 70.50%
5 187095419 127045161 18491637.89 90.11 54.72%
6 128349390 111761579 7158095.58 94.43 62.78%
7 179986447 157822483 8252445.58 95.42 67.83%
8 168862755 155255503 5564534.29 96.7 72.82%
9 68120928 52917367 9064366.31 86.69 64.79%
10 91055123 81982113 3763139.13 95.87 68.26%
11 102762099 87200096 7685804.48 92.52 73.37%
12 54279758 47416487 4163073.67 92.33 70.03%
13 77098401 56630953 13580101.62 82.38 71.51%
14 43196895 35167151 4397392.18 89.82 62.50%
15 47153275 40438052 2346857.98 95.02 63.87%
16 38437377 1867650 8585679.34 77.66 2.75%
I have attached the log file from the salmon run for this sample at the bottom.
Is there a way to try and understand this vast difference between the two runs?
The salmon run waas done against an indexed transcriptome of gencode v32, while the star mapping was done with Ensembl genome build Hsp GRCh38
Can be the cause for such a difference?
thanks,
Assa
$less ./16/logs/salmon_quant.log
[2019-12-10 13:22:38.684] [jointLog] [info] Fragment incompatibility prior below threshold. Incompatible fragments will be ignored.
[2019-12-10 13:22:38.702] [jointLog] [info] Usage of --validateMappings implies use of minScoreFraction. Since not explicitly specified, it is being set to 0.65
[2019-12-10 13:22:38.702] [jointLog] [info] Usage of --validateMappings implies a default consensus slack of 0.2. Setting consensusSlack to 0.35.
[2019-12-10 13:22:38.702] [jointLog] [info] parsing read library format
[2019-12-10 13:22:38.702] [jointLog] [info] There is 1 library.
[2019-12-10 13:22:38.848] [jointLog] [info] Loading pufferfish index
[2019-12-10 13:22:38.848] [jointLog] [info] Loading dense pufferfish index.
[2019-12-10 13:22:42.666] [jointLog] [info] done
[2019-12-10 13:22:42.670] [jointLog] [info] Index contained 226,608 targets
[2019-12-10 13:22:43.647] [jointLog] [warning] len : 25, but txp.RefLenght : 25
[2019-12-10 13:22:43.714] [jointLog] [warning] len : 23, but txp.RefLenght : 23
[2019-12-10 13:22:44.470] [jointLog] [warning] len : 12, but txp.RefLenght : 12
[2019-12-10 13:22:45.457] [jointLog] [warning] len : 28, but txp.RefLenght : 28
[2019-12-10 13:22:45.575] [jointLog] [warning] len : 8, but txp.RefLenght : 8
[2019-12-10 13:22:45.575] [jointLog] [warning] len : 9, but txp.RefLenght : 9
[2019-12-10 13:22:45.575] [jointLog] [warning] len : 13, but txp.RefLenght : 13
[2019-12-10 13:22:45.710] [jointLog] [warning] len : 11, but txp.RefLenght : 11
[2019-12-10 13:22:45.710] [jointLog] [warning] len : 20, but txp.RefLenght : 20
[2019-12-10 13:22:45.710] [jointLog] [warning] len : 18, but txp.RefLenght : 18
[2019-12-10 13:22:45.710] [jointLog] [warning] len : 20, but txp.RefLenght : 20
[2019-12-10 13:22:45.710] [jointLog] [warning] len : 19, but txp.RefLenght : 19
[2019-12-10 13:22:45.710] [jointLog] [warning] len : 31, but txp.RefLenght : 31
[2019-12-10 13:22:45.710] [jointLog] [warning] len : 28, but txp.RefLenght : 28
[2019-12-10 13:22:45.710] [jointLog] [warning] len : 17, but txp.RefLenght : 17
[2019-12-10 13:22:45.710] [jointLog] [warning] len : 21, but txp.RefLenght : 21
[2019-12-10 13:22:45.710] [jointLog] [warning] len : 20, but txp.RefLenght : 20
[2019-12-10 13:22:45.710] [jointLog] [warning] len : 16, but txp.RefLenght : 16
[2019-12-10 13:22:45.710] [jointLog] [warning] len : 31, but txp.RefLenght : 31
[2019-12-10 13:22:45.710] [jointLog] [warning] len : 17, but txp.RefLenght : 17
[2019-12-10 13:22:45.710] [jointLog] [warning] len : 21, but txp.RefLenght : 21
[2019-12-10 13:22:45.710] [jointLog] [warning] len : 23, but txp.RefLenght : 23
[2019-12-10 13:22:45.710] [jointLog] [warning] len : 16, but txp.RefLenght : 16
[2019-12-10 13:22:45.710] [jointLog] [warning] len : 31, but txp.RefLenght : 31
[2019-12-10 13:22:45.710] [jointLog] [warning] len : 31, but txp.RefLenght : 31
[2019-12-10 13:22:45.710] [jointLog] [warning] len : 31, but txp.RefLenght : 31
[2019-12-10 13:22:45.710] [jointLog] [warning] len : 17, but txp.RefLenght : 17
[2019-12-10 13:22:45.710] [jointLog] [warning] len : 18, but txp.RefLenght : 18
[2019-12-10 13:22:45.710] [jointLog] [warning] len : 31, but txp.RefLenght : 31
[2019-12-10 13:22:45.711] [jointLog] [warning] len : 31, but txp.RefLenght : 31
[2019-12-10 13:22:45.711] [jointLog] [warning] len : 17, but txp.RefLenght : 17
[2019-12-10 13:22:45.711] [jointLog] [warning] len : 28, but txp.RefLenght : 28
[2019-12-10 13:22:45.711] [jointLog] [warning] len : 23, but txp.RefLenght : 23
[2019-12-10 13:22:45.711] [jointLog] [warning] len : 19, but txp.RefLenght : 19
[2019-12-10 13:22:45.711] [jointLog] [warning] len : 31, but txp.RefLenght : 31
[2019-12-10 13:22:45.711] [jointLog] [warning] len : 31, but txp.RefLenght : 31
[2019-12-10 13:22:45.711] [jointLog] [warning] len : 17, but txp.RefLenght : 17
[2019-12-10 13:22:46.147] [jointLog] [warning] len : 28, but txp.RefLenght : 28
[2019-12-10 13:22:46.494] [jointLog] [warning] len : 28, but txp.RefLenght : 28
[2019-12-10 13:22:47.043] [jointLog] [info] Number of decoys : 0
[2019-12-10 13:22:47.043] [jointLog] [info] First decoy index : 18,446,744,073,709,551,576
[2019-12-10 13:22:50.936] [jointLog] [info] Automatically detected most likely library type as IU
[2019-12-10 13:29:10.048] [jointLog] [info] Computed 427,077 rich equivalence classes for further processing
[2019-12-10 13:29:10.048] [jointLog] [info] Counted 1,056,715 total reads in the equivalence classes
[2019-12-10 13:29:10.054] [jointLog] [info] Number of mappings discarded because of alignment score : 119,095,129
[2019-12-10 13:29:10.054] [jointLog] [info] Number of fragments entirely discarded because of alignment score : 611,432
[2019-12-10 13:29:10.054] [jointLog] [info] Number of fragments discarded because they are best-mapped to decoys : 0
[2019-12-10 13:29:10.054] [jointLog] [info] Number of fragments discarded because they have only dovetail (discordant) mappings to valid targets : 540,160
[2019-12-10 13:29:10.094] [jointLog] [warning] Only 1056715 fragments were mapped, but the number of burn-in fragments was set to 5000000.
The effective lengths have been computed using the observed mappings.
[2019-12-10 13:29:10.094] [jointLog] [info] Mapping rate = 2.74919%
[2019-12-10 13:29:10.094] [jointLog] [info] finished quantifyLibrary()
[2019-12-10 13:29:10.095] [jointLog] [info] Starting optimizer
[2019-12-10 13:29:10.048] [fileLog] [info]
At end of round 0
==================
Observed 38437377 total fragments (38437377 in most recent round)
[2019-12-10 13:29:10.560] [jointLog] [info] Marked 0 weighted equivalence classes as degenerate
[2019-12-10 13:29:10.727] [jointLog] [info] iteration = 0 | max rel diff. = 454.515
[2019-12-10 13:29:11.683] [jointLog] [info] iteration 11, adjusting effective lengths to account for biases
[2019-12-10 13:29:30.368] [jointLog] [info] Computed expected counts (for bias correction)
[2019-12-10 13:29:30.368] [jointLog] [info] processed bias for 0.0% of the transcripts
[2019-12-10 13:29:32.301] [jointLog] [info] processed bias for 10.0% of the transcripts
[2019-12-10 13:29:34.196] [jointLog] [info] processed bias for 20.0% of the transcripts
[2019-12-10 13:29:36.202] [jointLog] [info] processed bias for 30.0% of the transcripts
[2019-12-10 13:29:38.308] [jointLog] [info] processed bias for 40.0% of the transcripts
[2019-12-10 13:29:40.396] [jointLog] [info] processed bias for 50.0% of the transcripts
[2019-12-10 13:29:42.148] [jointLog] [info] processed bias for 60.0% of the transcripts
[2019-12-10 13:29:43.930] [jointLog] [info] processed bias for 70.0% of the transcripts
[2019-12-10 13:29:46.253] [jointLog] [info] processed bias for 80.0% of the transcripts
[2019-12-10 13:29:48.261] [jointLog] [info] processed bias for 90.0% of the transcripts
[2019-12-10 13:29:49.987] [jointLog] [info] processed bias for 100.0% of the transcripts
[2019-12-10 13:29:49.999] [jointLog] [info] processed bias for 100.0% of the transcripts
[2019-12-10 13:29:57.503] [jointLog] [info] iteration = 100 | max rel diff. = 2.82733
[2019-12-10 13:30:05.612] [jointLog] [info] iteration = 200 | max rel diff. = 5.36446
[2019-12-10 13:30:15.810] [jointLog] [info] iteration = 300 | max rel diff. = 0.0958765
[2019-12-10 13:30:26.099] [jointLog] [info] iteration = 400 | max rel diff. = 0.782063
[2019-12-10 13:30:28.606] [jointLog] [info] iteration = 424 | max rel diff. = 0.00927753
[2019-12-10 13:30:28.648] [jointLog] [info] Finished optimizer
[2019-12-10 13:30:28.648] [jointLog] [info] writing output
[2019-12-10 13:30:29.091] [jointLog] [info] Starting Gibbs Sampler
[2019-12-10 13:30:48.888] [jointLog] [info] Finished Gibbs Sampler
[2019-12-10 13:30:48.888] [jointLog] [warning] NOTE: Read Lib [[ rawData/HUDEP2_D3_WT_Band3_Low_R2_1677407_PXU011_WT2_D3_B3_L_R1.fastq.gz, rawData/HUDEP2_D3_WT_Band3_Low_R2_1677407_PXU011_WT2_D3_B3_L_R2.fastq.gz]] :
Detected a *potential* strand bias > 1% in an unstranded protocol check the file: quants/D3_WT_Band3_Low_R2_1677407_PXU011_WT2_D3_B3_L/lib_format_counts.json for details
This is not a fair comparison since mapping rate in STAR refers to mapping towards the whole genome and salmon mapping refers to assigned reads towards transcriptome. Check the distribution of reads actually mapping to exons when using STAR, e.g. with the
check_distribution.py
from RseQC.This is true. And I didn't expect it to be the same. But as one can see from the table the last file is also a lot less then the other salmon runs, so I don't know what to think of it.
and one of the good samples
How do I interpret this problem (these results)?
What would go wrong, so that the TES_down* values are so high in the first sample?
** BTW, the command is called
read_distribution.py