I want to clarify that, for an rpm file, I should use the value one million following the "-scale"? Or should the scale factor be 6 because it assumes a 10 base?
I tried the scaling factor command and receive the following output:
(standard_in) 1: illegal character: \342
(standard_in) 1: illegal character: \200
(standard_in) 1: illegal character: \234
-bash: 1000000/10743694": No such file or directory
I checked the spacing between all the parts of the command and they are correct. Does anyone know what I am doing wrong?
The value of -scale is the one that each value in column 4 of the bg is multiplied with, means you have to calculate the scaling factor externally. Here is a two-liner that can do it:
## Scaling factor for single-end data, counting every mapped read (bitwise flag = 0)
TmpScale=$(bc <<< "scale=6;1000000/$(samtools view -f 0 -c in.bam)")
## Now get the actual bedGraph:
echo '==> RPM scaling factor:' $TmpScale
bedtools genomecov -bga -ibam in.bam -scale $TmpScale | sort -k1,1 -k2,2n > out.bedGraph
For matters of completeness, there are tools that can output a RPM-normalized bedGraph or bigwig in one go, like deeptools bamCoverage. Still, both genomecov and deeptools are pretty slow. There is a newer tool, mosdepth, which outputs a compressed bedGraph and is lightning fast. The RPM-normalization can be done using something like awk once the bedGraph has been generated.
1) Do I need to download specific packages to carry out these commands?
2) Can I just lift over these commands, or are all the arrow characters you've used unimportant?
It counts total number of mapped reads in the bam file, and uses it to calculate a per-million scaling factor. Still, I recommend deeptools bamCoverage for creating browser tracks. Is offers many options that are quiet convinient.
Thinking about it again, for single end, I would probably do -f 0 -F 256 (read mapped, and not supplementary alignment), and for paired, -f 1 -F 256 (read paired, but not supplementary). Bitwise flags explained here.
Sorry to bother you again, but do you happen to know if bamCoverage requires for your input Bam file to be sorted? It doesn't look like but I can't seem to find a definitive answer anywhere.. It definitely seems to want a .bai to be present in the same directory as your bam file.
It requires indexed BAM, and indexing requires sorted BAM, so yes bamCoverage requires sorted BAM. Use samtools sort to sort your bam and then samtools index to index the sorted bam.
I tried the scaling factor command and receive the following output: (standard_in) 1: illegal character: \342 (standard_in) 1: illegal character: \200 (standard_in) 1: illegal character: \234 -bash: 1000000/10743694": No such file or directory
I checked the spacing between all the parts of the command and they are correct. Does anyone know what I am doing wrong?