Obtaining depth for all positions for different exonic ranges from mpileup file
1
0
Entering edit mode
6.0 years ago
kspata ▴ 90

Hi,

I am analyzing 96 samples sequenced using NextSeq PE 150 reads. The goal is to study variants and per exon coverage of 10 genes. I have obtained the bed file for all the exonic regions of the 10 genes from UCSC genome browser. I am using trim_galore -> bowtie2 -> samtools pipeline for calling variants. I am using hg38 human genome assembly for alignment.

I calculated per exon coverage using bedtools intersect command as follows:

bedtools coverage -abam Sample-1.sorted.bam -b sorted.intervals.bed -counts > PerExonCoverage.txt

This gives me an output file in following format:

Chromosome    Start         End         Coverage   
      chr1    161677476     161677553   6561
     chr20    1895448       1895526     2377
     chr20    1915099       1915455    12081

I want to know the coverage of each nucleotide position for all exons for e.g as follows:

chr1 161677476   13
chr1 161677477  120
chr1 161677478  130

and so on..

I have an intermediate mpileup file which I have used to extract coverage of all positions like this:

cut -f1-4 Sample-1.mpileup > Sample-1.depth

This gives me coverage for all positions of the reference genome and not for the positions of selected exons.

The above command also results in a file size of almost 60GB per sample which is very huge for the resources available to me.

Is there a way in which I can get the coverage of all positions but only for exons listed in BED using either mpileup or BAM file.

Help will be appreciated. Thanks in advance!!

next-gen gene resequencing exonic coverage • 1.4k views
ADD COMMENT
1
Entering edit mode
6.0 years ago
samtools depth -b sorted.intervals.bed Sample-1.sorted.bam
ADD COMMENT
0
Entering edit mode

Thanks Pierre for quick reply. I performed the above command and got the output but in the output the first position for every exon is excluded. For example for the exon range:

chr3    46372903    46373961

The coverage starts from the 46372904 position instead of 46372903. The length of the exon is 1059 bp but coverage is reported for 1058 bp. Should I subtract 1 from every start position and calculate again? Will this be a correct approach?

Also, I changed the range from chr3 46372903 46373961 to chr3 46372895 46373961 and it gives me coverage for the region outside exon as well. What is the reason for obtaining coverage outside the exonic regions? Shouldn't the coverage of regions outside exon be 0 or very low? The exonic region was captured using IDT predesinged gene capture pool.

ADD REPLY
0
Entering edit mode

it's a bed file. It's ZERO based and uses half-open intervals. The output of samtools is ONE-based.

ADD REPLY

Login before adding your answer.

Traffic: 2209 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6