salmon NumReads inconsistent with depth from --writeMappings?
0
0
Entering edit mode
5.4 years ago
ning ▴ 120

I am trying to count gene expression for differential expression analysis. I used salmon to quantify genes against a reference transcriptome to get their TPM, but also --writeMappings to check their max read depth. I got:

       Name   EffectiveLength       TPM      NumReads   MaxDepth
NR_146118.1             4,925   177,186   31,949,345           6
NR_132964.1                24   124,223      109,585       111,527

MaxDepth for NR_132964.1 was amended from 8,038 to 111,527 after comments pointing out that samtools without argument -d 0 limits MaxDepths to about 8000.

MaxDepth is the maximum read depth from samtools view on the output of --writeMappings. The rest of the columns are from salmon's quant.sf output file.

I understand why there is not a linear relationship between NumReads and TPM. But why are their MaxDepths so different?


Some more data. For MaxDepth == NA, samtools depth found no depth there---which shouldn't be the case for these top few expressed genes.

   Name           Length EffectiveLength      TPM  NumReads MaxDepth
 1 U13369.1        42999         42847.      28.0    43870.     8034
 2 NM_001190452.1   1555          1403.   12043.    618711.     8028
 3 NM_001190470.1   1036           884.   13554.    438843.     8035
 4 NM_001190476.1   1231          1079.    1531.     60512.     8005
 5 NM_001190487.2   1395          1243.    1262.     57434.     8022
 6 NM_001190702.1   1290          1138.   20012.    833999.     8036
 7 NM_001190706.1    527           376.   13917.    191698.     4476
 8 NM_001190708.1   1121           969.    3952.    140262.     8024
 9 NR_003285.3       157            39.1 151275.    216553.       57
10 NR_003286.4      1869          1717.     777.     48851.     8000
11 NR_132964.1       128            24.1 124223.    109585      8038
12 NR_145819.1     13351         13199.      45.5    21968.     8001
13 NR_145822.1      5066          4914.     825.    148378.       NA
14 NR_146117.1     13373         13221.     222.    107270.     8016
15 NR_146118.1      5077          4925.  177186.  31949345.        6
16 NR_146119.1      1869          1717.  439340.  27622373.      122
17 NR_146120.1       156            38.5  23238.     32776.       NA
18 NR_146144.1     13315         13163.    1264.    608993.     8009
19 NR_146148.1      5054          4902.     649.    116444.       NA
20 NR_146154.1      5055          4903.    2997.    538036.       NA

It has been pointed out in the comments that many of the MaxDepths around the 8000s is due to the default depth limit of samtools view. Nonetheless, I'd still wonder why some of these transcripts with high TPM and NumReads don't show any depth.

RNA-Seq salmon R software error samtools • 1.9k views
ADD COMMENT
0
Entering edit mode

probably because gene have different expression level?

ADD REPLY
0
Entering edit mode

Sure, but wouldn't their maximum read depths be roughly proportional to their TPM?

ADD REPLY
0
Entering edit mode

TPM and Depth are calculated using different algorithm for instance TPM as well as RPKM are counts based on the transcript length and read numbers for that gene. The lower TPM and the slightly higher depth might be due to a longer transcript. That's the explanation that I have, but I'm not the best bioinformatician around ;)

ADD REPLY
0
Entering edit mode

I understand your explanation, but I have some doubts as to the likelihood that this difference would cause such a counterintuitive and drastic difference in read depths. Thank nonetheless, though.

ADD REPLY
0
Entering edit mode

I cannot answer your question but for differential expression you do not need TPM. Also you would not need to look into the files. The recommended next step would be to use the tximport package to summarize the transcript level quantifications to the gene level followed by differential analysis with e.g. DESeq2/edgeR/limma.

Still I will tag Rob (the developer of salmon). Please add the command you used to get this MaxDepth column.

ADD REPLY
0
Entering edit mode

Thank you. I obtained the MaxDepth column by using samtools depth <ALIGNMENT> -r <REGION>, where <alignment> is the indexed and sorted file output by salmon ... --writeMappings=<ALIGNMENT>. [1] Columns are combined with a simple R script [2].

In my next step, I use edgeR to import gene counts in quant.sf using the first and fourth columns (Name, TPM). That is, I use the R command

library(edgeR)
files <- c(...)
count_matrix <- readDGE(files, columns=c(1,4))

[1]

all: MAP SORT INDEX DEPTH

INDEX_DIR = /home/shared/GenomeReferences/hg38_refseq_transcripts_with_rdna

MAP: WT1.mappings.sam
WT1.mappings.sam: ../WILDTYPE1_CGATGTA_S42_L008_R1_001.fastq.gz \
        ../WILDTYPE1_CGATGTA_S42_L008_R2_001.fastq.gz
        salmon quant -i ../index -l A -1 $< -2 $(word 2,$^) --validateMappings -o $@ \
                --writeMappings=WT1.mappings.sam

SORT: WT1.sorted.bam
WT1.sorted.bam: WT1.mappings.sam
        samtools sort -o WT1.sorted.bam WT1.mappings.sam

INDEX: WT1.sorted.bam.bai
WT1.sorted.bam.bai: WT1.sorted.bam
        samtools index WT1.sorted.bam

REGIONS = NR_146118.1.depth NR_146119.1.depth NM_001190702.1.depth \
        NM_001190452.1.depth NR_146144.1.depth NR_146154.1.depth \
        NM_001190470.1.depth NR_003285.3.depth NM_001190706.1.depth \
        NR_145822.1.depth NM_001190708.1.depth NR_132964.1.depth \
        NR_146148.1.depth NR_146117.1.depth NR_003286.4.depth \
        NM_001190476.1.depth NM_001190487.2.depth U13369.1.depth \
        NR_146120.1.depth NR_145819.1.depth
DEPTH: $(REGIONS)
$(REGIONS): %.depth: %
        samtools depth WT1.sorted.bam -r $< > $@
$(patsubst %.depth,%,$(REGIONS)):

[2]

#! /usr/bin/env Rscript

suppressPackageStartupMessages(library(tidyverse))

depths <- suppressMessages(read_tsv('quant.sf'))
depths <- subset(depths, Name %in% c(
        'NM_001190452.1', 'NM_001190470.1', 'NM_001190476.1', 'NM_001190487.2',
        'NM_001190702.1', 'NM_001190706.1', 'NM_001190708.1',
        'NR_003285.3', 'NR_003286.4',
        'NR_132964.1',
        'NR_145819.1', 'NR_145822.1',
        'NR_146117.1', 'NR_146118.1', 'NR_146119.1', 'NR_146120.1',
        'NR_146144.1', 'NR_146148.1', 'NR_146154.1',
        'U13369.1'))

max.depths <- c()
for (name in depths$Name) {
    filename <- paste0(name, '.depth')
    depth.data <- suppressMessages(
            read_tsv(filename, col_names=c('name', 'pos', 'depth')))
    max.depths <- c(
            max.depths,
            suppressWarnings(max(depth.data$depth)))
}
depths$MaxDepth <- max.depths
depths$MaxDepth[depths$MaxDepth==-Inf] <- NA

depths
ADD REPLY
0
Entering edit mode
  1. What not use tximport?
  2. What version of samtools are you using and are you changing the maximum possible depth option in samtools depth?
ADD REPLY
0
Entering edit mode
  1. Frankly, I've just heard of tximport. Obviously, I should be using that, but... I think the apparent discrepancy between maximum read depth and TPM still hold, even if we are looking at a transcript rather than gene level.
  2. samtools 1.9, Using htslib 1.9. No, I do not pass any option to samtools depth besides specifying the region. See [1] above, or the Makefile extract:

        samtools depth WT1.sorted.bam -r $< > $@
    
ADD REPLY
0
Entering edit mode
-d/-m <int>         maximum coverage depth [8000]. If 0, depth is set to the maximum
                       integer value, effectively removing any depth limit.

You should set -d to 0 when running samtools depth otherwise it is using 8000 and that may be leading to odd numbers you are getting.

You may also want to look at mosdepth as a fast alternative (https://github.com/brentp/mosdepth ).

ADD REPLY
0
Entering edit mode

What I think you're observing is a result of the default ~8000 depth in samtools depth and samtools mpileup. In essence, this is the size of a buffer for holding reads and once the depth goes much above this you're going to start seeing some odd behavior due to that (I don't recall the details of this edge case, I'd have to check the source code to see how it's handled).

ADD REPLY

Login before adding your answer.

Traffic: 1739 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6