genomeCoverageBed and supplementary alignments
1
0
Entering edit mode
8.1 years ago

Hi all, I would like to know how genomeCoverageBed deals with secondary alignments in a sam/bam file. I have a set of sam files generated with bowtie2 using the -a flag and I would like to generate a coverage map with genomeCoverageBed. These are the steps that I normally do to obtain a bam file to use for coverage estimation:

1 - Mapping reads with bowtie2
2 - Converting sam to bam and sorting output with samtools
3 - Removing duplicated reads with Picard (MarkDuplicates)
4 - Sorting output again with samtools
5 - Generating a coverage map with genomeCoverageBed

Actually, I tried to map my reads both with and without the -a flag and I got slightly different results. Is it possible that bedtools included secondary alignments in the coverage map? If this is not the case, how does genomeCoverageBed deal with secondary alignments?

Thanks in advance,

Giovanni

genomeCoverageBed bedtools secondary alignment • 3.4k views
ADD COMMENT
0
Entering edit mode

What's the point of the step 3 ? removing duplicates ?

ADD REPLY
0
Entering edit mode

Removal of PCR duplicates, presumably.

ADD REPLY
0
Entering edit mode

Sorry my question wasn't clear.

Why it's useful to remove duplicates for estimating coverage ?

ADD REPLY
0
Entering edit mode

There's no estimation here, it's empirical observation. PCR duplicates are a nuisance that can often be ignored. This isn't always the case (e.g., if your signal should actually be highly focal, in which case duplicates can't be meaninfully marked), but often is.

ADD REPLY
0
Entering edit mode

Ok I have a doubt now.

For contig coverage estimation (after assembly), should I also remove duplciates ?

ADD REPLY
0
Entering edit mode

In my opinion they should be removed before coverage estimation otherwise your results would be overestimated.

ADD REPLY
0
Entering edit mode

@Picasa: What fraction of reads have been marked as duplicates (in your case)? If the fraction is small then they could be a minor nuisance but if you have a much larger fraction marked as duplicates then you may need to trace if there is a reason (e.g. ultra low input for library prep etc) or it is a bad library prep.

ADD REPLY
0
Entering edit mode

Usually, I consider duplication levels lower than 3-5% to be normal artifacts due to PCR amplification or optical duplicates. In this case, I have a duplication level lower than 1% so I'm quite confident with that.

ADD REPLY
0
Entering edit mode

@giovanni: My comment was directed at @Picasa (clarified now). Your case sounds "a ok".

ADD REPLY
0
Entering edit mode

Removal of PCR and optical duplicates

ADD REPLY
0
Entering edit mode
8.1 years ago

You can skip step 4, picard's output is already sorted.

bedtools will include secondary and supplemental alignments in its coverage. If you want to exclude them then you'll either need to prefilter them or use a different tool. Personally, I'd just use bamCoverage from deepTools, but I'm a bit biased there.

ADD COMMENT

Login before adding your answer.

Traffic: 2871 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6