Entering edit mode
3 months ago
pablo
▴
310
Hi,
I did an alignment of my PE-reads with bwa
on the plant genome I work on. I computed the coverage for each chromosome in this way :
- I created windows of 10kb of each chromosome of my reference genome with
bedtools makewindows
- I computed the coverage for each of these windows with
bedtools coverage
to get a.bedgraph
file - I did some coverage plots with
ggplot2
(one per chromosome) from the bedgraph file
I noticed the overall coverage was a bit higher than the expected one. I suspected than certain reads could account for 2 if they overlap two windows. I checked that with samtools view
to find some reads overlapping two windows and then compute the coverage with a "custom way" on that two windows, and that confirmed my doubt.
Two questions :
- is there a better to check that overlapping reads, rather than using my custom way
- and is there a way to count only 1X for these cases?
Best, Pablo
see bedtools intersect
using -wa and -wb can give you each overlap (which can be subjected to a cut + uniq or so).
For at least one overlap kind of things, see -u option
Do remember that both the windows(bed) as well as the alignment(bam) can be set as A or B depending on what you are looking for.
Indeed, I generated two 10kb adjacent windows BAM files and
bedtools intersect
does the job to identify the overlapping reads between those two windows. Is it possible withbedtools genomecov
orcoverage
to keep only once occurence of these reads to compute the coverage?I do not know of a way by which such a constraint can be imposed in genomecov/coverage.
bedtools intersect
will do the job then, thanks.