Question

Bam to RPM Bedgraph Conversion

0

Entering edit mode

6.7 years ago

virlana.shchuka • 0

I want to convert my ChIP-seq bam files to an RPM bedgraph file. I want to use the following command:

bedtools genomecov -bg -split -scale 1000000 -trackline -trackopts “name=bam_file_name description=bam_file” -ibam bamfilename > bedgraphfilename

I want to clarify that, for an rpm file, I should use the value one million following the "-scale"? Or should the scale factor be 6 because it assumes a 10 base?

Thanks in advance!

ChIP-Seq • 4.2k views

ADD COMMENT • link updated 6.3 years ago by h.mon 35k • written 6.7 years ago by virlana.shchuka • 0

0

Entering edit mode

I tried the scaling factor command and receive the following output: (standard_in) 1: illegal character: \342 (standard_in) 1: illegal character: \200 (standard_in) 1: illegal character: \234 -bash: 1000000/10743694": No such file or directory

I checked the spacing between all the parts of the command and they are correct. Does anyone know what I am doing wrong?

ADD REPLY • link 6.6 years ago by virlana.shchuka • 0

score 3 · Answer 1 · 2018-04-01

3

Entering edit mode

6.6 years ago

ATpoint 85k

The value of -scale is the one that each value in column 4 of the bg is multiplied with, means you have to calculate the scaling factor externally. Here is a two-liner that can do it:

 ## Scaling factor for single-end data, counting every mapped read (bitwise flag = 0)
 TmpScale=$(bc <<< "scale=6;1000000/$(samtools view -f 0 -c in.bam)")

 ## Now get the actual bedGraph:
 echo '==> RPM scaling factor:' $TmpScale
 bedtools genomecov -bga -ibam in.bam -scale $TmpScale | sort -k1,1 -k2,2n > out.bedGraph

For matters of completeness, there are tools that can output a RPM-normalized bedGraph or bigwig in one go, like deeptools bamCoverage. Still, both genomecov and deeptools are pretty slow. There is a newer tool, mosdepth, which outputs a compressed bedGraph and is lightning fast. The RPM-normalization can be done using something like awk once the bedGraph has been generated.

ADD COMMENT • link 6.6 years ago by ATpoint 85k

0

Entering edit mode

Thank you for your reply.

1) Do I need to download specific packages to carry out these commands? 2) Can I just lift over these commands, or are all the arrow characters you've used unimportant?

ADD REPLY • link 6.6 years ago by virlana.shchuka • 0

0

Entering edit mode

Would it be possible for you to explain what exactly your first command is doing?

ADD REPLY • link 6.3 years ago by m98 ▴ 420

1

Entering edit mode

It counts total number of mapped reads in the bam file, and uses it to calculate a per-million scaling factor. Still, I recommend deeptools bamCoverage for creating browser tracks. Is offers many options that are quiet convinient.

ADD REPLY • link 6.3 years ago by ATpoint 85k

0

Entering edit mode

Thanks, makes sense. Last question: how would it work for paired-end data?

ADD REPLY • link 6.3 years ago by m98 ▴ 420

1

Entering edit mode

Thinking about it again, for single end, I would probably do -f 0 -F 256 (read mapped, and not supplementary alignment), and for paired, -f 1 -F 256 (read paired, but not supplementary). Bitwise flags explained here.

ADD REPLY • link 6.3 years ago by ATpoint 85k

0

Entering edit mode

Sorry to bother you again, but do you happen to know if bamCoverage requires for your input Bam file to be sorted? It doesn't look like but I can't seem to find a definitive answer anywhere.. It definitely seems to want a .bai to be present in the same directory as your bam file.

ADD REPLY • link 6.3 years ago by m98 ▴ 420

1

Entering edit mode

It requires indexed BAM, and indexing requires sorted BAM, so yes bamCoverage requires sorted BAM. Use samtools sort to sort your bam and then samtools index to index the sorted bam.

ADD REPLY • link 6.3 years ago by ATpoint 85k