Here is a step by step guide to the problem/question.

Question

Tutorial:Plotting the coverage and depth (Y-axis) statistics of a bam file along the genome (X-axis), using Samtools, awk and R.

6

Entering edit mode

8.0 years ago

kirannbishwa01 ★ 1.6k

This is the same problem I recently had, but I was not able to find a good solution. I was finally able to work a method which I am sharing here. Specially for empirical biologists who know what to do but not what to do exactly this solution might be an opener. I hope this will be helpful for others.

Identification of the depth of coverage is quite useful in 1) identifying the regions that might have potential paralogous alignment 2) finding the coverage at your regions of interest.

Here is a step by step guide to the problem/question.

Step #1) First identify the depth at each locus from a bam file.

I have found samtools depth option more useful in this regard, when coverage at each locus is desired.

samtools depth deduped_MA605.bam > deduped_MA605.coverage

The output file 'deduped_MA605.coverage' file will have 3 columns (Chr#, position and depth at that position) like below.

Chr  position depth (this header will be absent though)
1       3980        66
1       3981        68
1      3982     67
1      3983        67
1      3984     68

Step #2) Now, select the coverage (depth) by locus for each chromosome and/or regions

We can use the coverage file to plot it in R. But, the file is so large that it will suck up almost all the memory. It better to split the coverage by chromosome (or region of the chromosome if required).

To select the coverage for a particular chromosome (Chr#1 in my case)

awk '$1 == 1 {print $0}' deduped_MA605.coverage > chr1_MA605.coverage

To select coverage from chr #2

awk '$1 == 2 {print $0}' deduped_MA605.coverage > chr2_MA605.coverage

If the chrosomosome has string characters it can be adjusted as

awk '$1 == "chr2" {print $0}' deduped_MA605.coverage > chr2_MA605.coverage

Step #3) To plot the data in R this coverage file will need to be imported and the headers need to be added.

MA605.chr2 <- read.table("/media/everestial007/SeagateBackup4.0TB/New_Alignment_Set/05-B-deDupedReads-forScanIndel/chr2_MA605.coverage",
         header=FALSE, sep="\t", na.strings="NA", dec=".", strip.white=TRUE)`

Note: The header of the column are automatically set as V1, V2 and V3.

To rename the headers

library(reshape) # loads the library to rename the column names

MA605.chr2<-rename(MA605.chr2,c(V1="Chr", V2="locus", V3="depth")) # renames the header

Now, plot the coverage by depth:

plot(MA605.chr2$locus, MA605.chr2$depth)

to get the wider/cleaner view of the plot use

library(lattice, pos=10) xyplot(depth ~ locus, type="p", pch=16, auto.key=list(border=TRUE), par.settings=simpleTheme(pch=16), scales=list(x=list(relation='same'), y=list(relation='same')), data=MA605.chr2, main="depth by locus - Chr2 (SampleMA605)")

The output plot looks like this: coverage depth by locus

Note:

We can see that there is a region around the middle and next to it that has a very high coverage. This likely suggests paralogous alignment.
Also, there is a gap in alignment. The region with no alignment is near (at) centromere where no consensus sequence has been found.
Reads may be later filtered by coverage.

I will post the details on how to filter by coverage in a week or more.

Thanks,

genome plot coverage samtools • 21k views

ADD COMMENT • link updated 20 months ago by William ▴ 40 • written 8.0 years ago by kirannbishwa01 ★ 1.6k

score 6 · Answer 1 · 2017-04-04

6

Entering edit mode

8.0 years ago

Pierre Lindenbaum 166k

a one-liner:

samtools depth -b <(echo -e '1\t231408092\t231408119') input.bam | cut -f 2,3 |\
gnuplot -e "set terminal png; set output 'out.png'; set title 'Coverage'; set xlabel 'position' ; set xtics rotate by 90 right ; set ylabel 'depth' ; plot '-'  using 2:xticlabels(1)  with lines notitle;"

ADD COMMENT • link 8.0 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Hi Pierre, You script looks interesting. If you have some spare time, could you add some details about your script. Just requesting.

Thanks,

ADD REPLY • link 8.0 years ago by kirannbishwa01 ★ 1.6k

score 2 · Answer 2 · 2023-08-18

2

Entering edit mode

20 months ago

William ▴ 40

Hi!

You can use the python program bam2plot, which I have developed. Install via pip install bam2plot, and run:

bam2plot my_bam.bam

It produces one png and one svg for each reference (e.g. chromosome) in the bam file.

Check it out here: https://github.com/willros/bam2plot

Regards, William

ADD COMMENT • link 20 months ago by William ▴ 40