I have RNA-seq samples (human organism) generated through ribo depletion kit. Initially I checked the library type of the samples using RSEQc. It is reverse forward. So, in the alignment with Hisat2
I used --rna-strandness RF
which is -fr-firststrand
in Tophat
I'm trying to use salmon
on the same samples with library type -l ISR
based on their manual salmon librarytype
This is the command I used:
salmon quant -i index/ -l ISR -1 AT.1.fastq.gz -2 AT.2.fastq.gz -o transcripts_quant
When I checked the output file with mapping information I see like following in the end of the file:
ESC[1m[2018-05-23 23:02:18.809] [jointLog] [info] Computed 333657 rich equivalence classes for further processing
ESC[00mESC[1m[2018-05-23 23:02:18.809] [jointLog] [info] Counted 27089612 total reads in the equivalence classes
ESC[00mESC[33mESC[1m[2018-05-23 23:02:18.823] [jointLog] [warning] 0.0175308% of fragments were shorter than the k used to build the index (31).
If this fraction is too large, consider re-building the index with a smaller k.
The minimum read size found was 20.
ESC[00mESC[1m[2018-05-23 23:02:18.823] [jointLog] [info] Mapping rate = 28.9152%
ESC[00mESC[1m[2018-05-23 23:02:18.823] [jointLog] [info] finished quantifyLibrary()
ESC[00mESC[1m[2018-05-23 23:02:18.825] [jointLog] [info] Starting optimizer
ESC[00mESC[1m[2018-05-23 23:02:24.405] [jointLog] [info] Marked 0 weighted equivalence classes as degenerate
ESC[00mESC[1m[2018-05-23 23:02:24.423] [jointLog] [info] iteration = 0 | max rel diff. = 48.1542
ESC[00mESC[1m[2018-05-23 23:02:25.913] [jointLog] [info] iteration = 100 | max rel diff. = 0.0934775
ESC[00mESC[1m[2018-05-23 23:02:27.400] [jointLog] [info] iteration = 200 | max rel diff. = 0.0553936
ESC[00mESC[1m[2018-05-23 23:02:28.846] [jointLog] [info] iteration = 300 | max rel diff. = 0.0348972
ESC[00mESC[1m[2018-05-23 23:02:30.357] [jointLog] [info] iteration = 400 | max rel diff. = 0.0276639
ESC[00mESC[1m[2018-05-23 23:02:31.834] [jointLog] [info] iteration = 500 | max rel diff. = 0.0228071
ESC[00mESC[1m[2018-05-23 23:02:33.341] [jointLog] [info] iteration = 600 | max rel diff. = 0.0191266
ESC[00mESC[1m[2018-05-23 23:02:34.779] [jointLog] [info] iteration = 700 | max rel diff. = 0.0171199
ESC[00mESC[1m[2018-05-23 23:02:36.308] [jointLog] [info] iteration = 800 | max rel diff. = 0.0134323
ESC[00mESC[1m[2018-05-23 23:02:37.754] [jointLog] [info] iteration = 900 | max rel diff. = 0.0129089
ESC[00mESC[1m[2018-05-23 23:02:39.248] [jointLog] [info] iteration = 1000 | max rel diff. = 0.0108738
ESC[00mESC[1m[2018-05-23 23:02:40.756] [jointLog] [info] iteration = 1100 | max rel diff. = 0.010454
ESC[00mESC[1m[2018-05-23 23:02:41.058] [jointLog] [info] iteration = 1122 | max rel diff. = 0.00969727
ESC[00mESC[1m[2018-05-23 23:02:41.080] [jointLog] [info] Finished optimizer
ESC[00mESC[1m[2018-05-23 23:02:41.080] [jointLog] [info] writing output
ESC[00mESC[33mESC[1m[2018-05-23 23:02:41.518] [jointLog] [warning] NOTE: Read Lib [( AT.1.fastq.gz, AT.2.fastq.gz )] :
Greater than 5% of the fragments disagreed with the provided library type; check the file: transcripts_quant/lib_format_counts.json for details
As you see in the end it is saying Greater than 5% of the fragments disagreed with the provided library type
Then I also looked into lib_format_counts.json
This is what I saw in .json file:
"read_files": "( AT.1.fastq.gz, AT.2.fastq.gz )",
"expected_format": "ISR",
"compatible_fragment_ratio": 0.8183487087227385,
"num_compatible_fragments": 22168749,
"num_assigned_fragments": 27089612,
"num_consistent_mappings": 83629771,
"num_inconsistent_mappings": 11396640,
"MSF": 0,
"OSF": 27232,
"ISF": 4075759,
"MSR": 0,
"OSR": 73061,
"ISR": 83629771,
"SF": 2794681,
"SR": 4423463,
"MU": 0,
"OU": 0,
"IU": 0,
"U": 0
1) What is the problem here with library type?
2) The overall alignment rate for this sample with hisat2 is 91% and here I see mapping rate is 28%. Why is that difference?