Question

Find covered regions in a set of BAM files

0

Entering edit mode

4.6 years ago

German.M.Demidov ★ 2.9k

Hi,

I need to perform SNV analysis in a cohort - for that I need to find "good" regions that are more or less covered in a set of several hundred of WES samples (sequenced with different enrichment kits).

Is there a tool that takes a set of BAM files, and provides a BED file with positions covered with at least X reads?

Thanks!

I googled for some - but they require e.g. target region as input - but my samples were sequenced with different kits ( https://bioconductor.org/packages/release/bioc/vignettes/BadRegionFinder/inst/doc/BadRegionFinder.pdf ) so it does not make much sense...

coverage WES • 1.1k views

ADD COMMENT • link 4.6 years ago by German.M.Demidov ★ 2.9k

score 2 · Accepted Answer · 2020-04-13

2

Entering edit mode

4.6 years ago

ATpoint 85k

You could simply bin the genome into chunks of x bp and then make a count matrix with featureCounts. This will give a column = samples and rows = regions matrix.

As simple as:

bedtools makewindows -w <binsize> -g genomesize.file | awk 'OFS="\t" {print $1"_"$2"_"$3, $1, $2, $3, "."}' > bins.saf

featureCounts -a bins.saf -F SAF -T <cores> -o bins.counts *.bam

This you can then easily filter for a certain coverage in at least x samples.

Alternatively, I think indexcov and indexsplit from goleft might be of interest: https://github.com/brentp/goleft

ADD COMMENT • link 4.6 years ago by ATpoint 85k

0

Entering edit mode

will try this thanks! I've tried to bin into 5 bp windows with another tool - segfault, while for e.g. 500bp worked perfectly...

ADD REPLY • link 4.6 years ago by German.M.Demidov ★ 2.9k

1

Entering edit mode

Well 5bp bins will give like 600mio of them for human, no surprise this might be a bit much ;-) You can also only bin the entire exom, e.g. from GENCODE (so everything that is annotated) which is probably as comprehensive as it gets right now. RefSeq is always more conservative than GENCODE. This will reduce the amount of DNA you have consider for coverage calculation less than 10% of total.

ADD REPLY • link 4.6 years ago by ATpoint 85k

0

Entering edit mode

Thanks, it sounds reasonable. What would you say about the expanding GENCODE exons +/- 20 bp? or 30? to catch potential splicing-affecting variants?

ADD REPLY • link 4.6 years ago by German.M.Demidov ★ 2.9k

1

Entering edit mode

Or quarter or simply half the feature size. bedtools slop can extend every feature by a given fraction, e.g. 0.25, that might be the easiest and should catch everything.