Question

Apply Deseq2 size factor to aligned reads

3

Entering edit mode

7.5 years ago

Bastien Hervé 6.2k

After a standard RNA-seq analysis : STAR -> FeatureCounts -> Deseq2, I want to visualize my reads coverage with IGV. Nevertheless, BAM files are not normalized, so visualisation is biased. Indeed, I used the size factor from Deseq2 to normalize my BAM files (https://support.bioconductor.org/p/76712/).

Question : is it acceptable to apply size factor calculated on counted read to BAM files which are aligned reads ?

How I did the job :

# DESeq2 size factor
WT_R=0.8560343;
WT_S=1.1681774;

# Change value for bamCoverage option (counts need to be divided by size factor)
WT_R_div=$(awk -v WT_R=$WT_R 'BEGIN { print (1 / WT_R) }');
WT_S_div=$(awk -v WT_S=$WT_S 'BEGIN { print (1 / WT_S) }');

# Apply size factor to BAMs / BIGWIG with bamCoverage "--scaleFactor" (and assign reads to forward or reverse strand)

for f in WT*.bam;

do scale=$(eval printf "$"${f%.*}"_div");

bamCoverage --bam $f \
--outFileName ${f%.*}"_forward_strand.bigwig" \
--outFileFormat bigwig \
--scaleFactor $scale \
--filterRNAstrand forward \
--binSize 1 \
--verbose;

bamCoverage --bam $f \
--outFileName ${f%.*}"_reverse_strand.bigwig" \
--outFileFormat bigwig \
--scaleFactor $scale \
--filterRNAstrand reverse \
--binSize 1 \
--verbose;
done &>> bamCoverage.verbose;

Deseq2 Size Factor BAM RNA-seq • 4.6k views

ADD COMMENT • link updated 7.5 years ago by Devon Ryan 105k • written 7.5 years ago by Bastien Hervé 6.2k

score 2 · Answer 1 · 2017-11-22

2

Entering edit mode

7.5 years ago

Devon Ryan 105k

Sure, you can use the DESeq2 scale factor. I don't recall whether DESeq2 is dividing by the scale factor or multiplying by it. If it's doing the latter then you don't need to invert (just compare a normalized and raw count in DESeq2 to be sure).

ADD COMMENT • link 7.5 years ago by Devon Ryan 105k

1

Entering edit mode

DESeq2 divides, an easy way in bash to convert the factor to feed into bamCoverage would be, given that the factor is saved as variable $FACTOR:

FACTOR_INVERT=$(bc <<< "scale=6;$(echo ${FACTOR})^-1")

ADD REPLY • link 6.3 years ago by ATpoint 88k

0

Entering edit mode

Thanks Devon, but that's bothering me to apply size factor (generated with counted reads) on aligned reads. The ratio aligned reads/counted reads is not always the same in my samples. Say differently, I don't lose the same amount of reads in my samples from aligned reads to counted reads. If Deseq2 would've created size factor using aligned reads I will be OK to use it on aligned reads, but it's not.

ADD REPLY • link 7.5 years ago by Bastien Hervé 6.2k

1

Entering edit mode

If you want to be totally clean about that, you could remove those reads from the BAM file that are not being taken into consideration for the read count matrix. Not sure if featureCounts can tell you the read IDs that it's using, but since it comes with an aligner, maybe there is a way to retrieve that information.

Irrespective of that, I think as long as you remove the presumably biggest chunk of reads you're ignoring (i.e., those not overlapping with exons, perhaps duplicates), you're already playing it fairly safe. After all, I thought you mainly want to do it for visualization purposes, not for statistical calculations, right?

ADD REPLY • link 7.5 years ago by Friederike 9.0k

1

Entering edit mode

The counted reads are just those aligned with a minimal MAPQ to remove obvious amounts of multimapping. You can apply a reasonable MAPQ threshold in deepTools (e.g., --minMappingQuality 4 would work for STAR and match what featureCounts is doing by default) and then the reads used are mostly the same (the main difference being those not overlapping genes).

ADD REPLY • link 7.5 years ago by Devon Ryan 105k

0

Entering edit mode

I'm trying to do something similar. I figured you may be able to answer Devon. Will the --scaleFactor option accept fractions? As in --scaleFactor 1000000/20000000.

One other question. The documentation for --normalizeUsing says this:

None = the default and equivalent to not setting this option at all. This scaling factor, in turn, is determined from the sequencing depth: (total number of mapped reads * fragment length) / effective genome size. The scaling factor used is the inverse of the sequencing depth computed for the sample to match the 1x coverage.

Does this mean it's not possible to output raw counts per bin?

ADD REPLY • link 6.5 years ago by goodez ▴ 640

1

Entering edit mode

You can't use expressions for the scaling factor, you'll need to put the resulting value in.

I need to update the documentation regarding what None does. It will produce raw coverage metrics with no adjustment by default, which is what it sounds like you want.

ADD REPLY • link 6.5 years ago by Devon Ryan 105k