Question

Salmon tool giving an error about the library type

0

Entering edit mode

7.2 years ago

Vasu ▴ 800

Hi,

I have RNA-seq samples (human organism) generated through ribo depletion kit. Initially I checked the library type of the samples using RSEQc. It is reverse forward. So, in the alignment with Hisat2 I used --rna-strandness RF which is -fr-firststrand in Tophat.

I'm trying to use salmon on the same samples with library type -l ISR based on their manual salmon librarytype

This is the command I used:

salmon quant -i index/ -l ISR -1 AT.1.fastq.gz -2 AT.2.fastq.gz -o transcripts_quant

When I checked the output file with mapping information I see like following in the end of the file:

ESC[1m[2018-05-23 23:02:18.809] [jointLog] [info] Computed 333657 rich equivalence classes for further processing
ESC[00mESC[1m[2018-05-23 23:02:18.809] [jointLog] [info] Counted 27089612 total reads in the equivalence classes 
ESC[00mESC[33mESC[1m[2018-05-23 23:02:18.823] [jointLog] [warning] 0.0175308% of fragments were shorter than the k used to build the index (31).
If this fraction is too large, consider re-building the index with a smaller k.
The minimum read size found was 20.


ESC[00mESC[1m[2018-05-23 23:02:18.823] [jointLog] [info] Mapping rate = 28.9152%

ESC[00mESC[1m[2018-05-23 23:02:18.823] [jointLog] [info] finished quantifyLibrary()
ESC[00mESC[1m[2018-05-23 23:02:18.825] [jointLog] [info] Starting optimizer
ESC[00mESC[1m[2018-05-23 23:02:24.405] [jointLog] [info] Marked 0 weighted equivalence classes as degenerate
ESC[00mESC[1m[2018-05-23 23:02:24.423] [jointLog] [info] iteration = 0 | max rel diff. = 48.1542
ESC[00mESC[1m[2018-05-23 23:02:25.913] [jointLog] [info] iteration = 100 | max rel diff. = 0.0934775
ESC[00mESC[1m[2018-05-23 23:02:27.400] [jointLog] [info] iteration = 200 | max rel diff. = 0.0553936
ESC[00mESC[1m[2018-05-23 23:02:28.846] [jointLog] [info] iteration = 300 | max rel diff. = 0.0348972
ESC[00mESC[1m[2018-05-23 23:02:30.357] [jointLog] [info] iteration = 400 | max rel diff. = 0.0276639
ESC[00mESC[1m[2018-05-23 23:02:31.834] [jointLog] [info] iteration = 500 | max rel diff. = 0.0228071
ESC[00mESC[1m[2018-05-23 23:02:33.341] [jointLog] [info] iteration = 600 | max rel diff. = 0.0191266
ESC[00mESC[1m[2018-05-23 23:02:34.779] [jointLog] [info] iteration = 700 | max rel diff. = 0.0171199
ESC[00mESC[1m[2018-05-23 23:02:36.308] [jointLog] [info] iteration = 800 | max rel diff. = 0.0134323
ESC[00mESC[1m[2018-05-23 23:02:37.754] [jointLog] [info] iteration = 900 | max rel diff. = 0.0129089
ESC[00mESC[1m[2018-05-23 23:02:39.248] [jointLog] [info] iteration = 1000 | max rel diff. = 0.0108738
ESC[00mESC[1m[2018-05-23 23:02:40.756] [jointLog] [info] iteration = 1100 | max rel diff. = 0.010454
ESC[00mESC[1m[2018-05-23 23:02:41.058] [jointLog] [info] iteration = 1122 | max rel diff. = 0.00969727
ESC[00mESC[1m[2018-05-23 23:02:41.080] [jointLog] [info] Finished optimizer
ESC[00mESC[1m[2018-05-23 23:02:41.080] [jointLog] [info] writing output 

ESC[00mESC[33mESC[1m[2018-05-23 23:02:41.518] [jointLog] [warning] NOTE: Read Lib [( AT.1.fastq.gz, AT.2.fastq.gz )] :

Greater than 5% of the fragments disagreed with the provided library type; check the file: transcripts_quant/lib_format_counts.json for details

As you see in the end it is saying Greater than 5% of the fragments disagreed with the provided library type Then I also looked into lib_format_counts.json file.

This is what I saw in .json file:

{
    "read_files": "( AT.1.fastq.gz, AT.2.fastq.gz )",
    "expected_format": "ISR",
    "compatible_fragment_ratio": 0.8183487087227385,
    "num_compatible_fragments": 22168749,
    "num_assigned_fragments": 27089612,
    "num_consistent_mappings": 83629771,
    "num_inconsistent_mappings": 11396640,
    "MSF": 0,
    "OSF": 27232,
    "ISF": 4075759,
    "MSR": 0,
    "OSR": 73061,
    "ISR": 83629771,
    "SF": 2794681,
    "SR": 4423463,
    "MU": 0,
    "OU": 0,
    "IU": 0,
    "U": 0
}

1) What is the problem here with library type?

2) The overall alignment rate for this sample with hisat2 is 91% and here I see mapping rate is 28%. Why is that difference?

RNA-Seq salmon alignment rna library • 1.9k views

ADD COMMENT • link 7.2 years ago by Vasu ▴ 800

score 0 · Answer 1 · 2018-05-24

0

Entering edit mode

7.2 years ago

GenoMax 152k

See @Rob's answer: SALMON's warning library type

ADD COMMENT • link 7.2 years ago by GenoMax 152k