Find covered regions in a set of BAM files
1
0
Entering edit mode
4.6 years ago

Hi,

I need to perform SNV analysis in a cohort - for that I need to find "good" regions that are more or less covered in a set of several hundred of WES samples (sequenced with different enrichment kits).

Is there a tool that takes a set of BAM files, and provides a BED file with positions covered with at least X reads?

Thanks!

I googled for some - but they require e.g. target region as input - but my samples were sequenced with different kits ( https://bioconductor.org/packages/release/bioc/vignettes/BadRegionFinder/inst/doc/BadRegionFinder.pdf ) so it does not make much sense...

coverage WES • 1.1k views
ADD COMMENT
2
Entering edit mode
4.6 years ago
ATpoint 85k

You could simply bin the genome into chunks of x bp and then make a count matrix with featureCounts. This will give a column = samples and rows = regions matrix.

As simple as:

bedtools makewindows -w <binsize> -g genomesize.file | awk 'OFS="\t" {print $1"_"$2"_"$3, $1, $2, $3, "."}' > bins.saf

featureCounts -a bins.saf -F SAF -T <cores> -o bins.counts *.bam

This you can then easily filter for a certain coverage in at least x samples.

Alternatively, I think indexcov and indexsplit from goleft might be of interest: https://github.com/brentp/goleft

ADD COMMENT
0
Entering edit mode

will try this thanks! I've tried to bin into 5 bp windows with another tool - segfault, while for e.g. 500bp worked perfectly...

ADD REPLY
1
Entering edit mode

Well 5bp bins will give like 600mio of them for human, no surprise this might be a bit much ;-) You can also only bin the entire exom, e.g. from GENCODE (so everything that is annotated) which is probably as comprehensive as it gets right now. RefSeq is always more conservative than GENCODE. This will reduce the amount of DNA you have consider for coverage calculation less than 10% of total.

ADD REPLY
0
Entering edit mode

Thanks, it sounds reasonable. What would you say about the expanding GENCODE exons +/- 20 bp? or 30? to catch potential splicing-affecting variants?

ADD REPLY
1
Entering edit mode

Or quarter or simply half the feature size. bedtools slop can extend every feature by a given fraction, e.g. 0.25, that might be the easiest and should catch everything.

ADD REPLY
0
Entering edit mode

Thanks a lot! Yes, 0.25 sounds very reasonable =) will do it tomorrow!

ADD REPLY

Login before adding your answer.

Traffic: 2758 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6