multiIntersectBed for SNPs, need to compare but not merge intervals
1
0
Entering edit mode
7.9 years ago
Adrian Pelin ★ 2.6k

Hello,

I have cases when there are SNPs called adjacent to each other, ex:

scaffold_1  106 107 C/A +
scaffold_1  107 108 G/A +

This is probably because this is not an SNP but an MNP (multiple nucleotide polymorphism?) where more than one bp changes at a time. Regardless, I would like to treat these as SNPs for the time being because they aren't that many and are not affecting my results.

I want to compare different isolate with multiIntersectBed, but the problem I am getting is that bedttools is merging such intervals, creating a common one 106 to 108, which really screws up the analysis. To show you what I mean:

@OHRI-bell-virt /scratch/temp_Test $ head ABWUN.NUMALT1.SNP.bed
scaffold_1  54  55  A/T +
scaffold_1  55  56  G/T +
scaffold_1  62  63  G/A +
scaffold_1  103 104 T/C +
scaffold_1  106 107 C/A +
scaffold_1  107 108 G/A +
scaffold_1  124 125 G/T +
scaffold_1  158 159 A/T +
scaffold_1  162 163 T/C +
scaffold_1  218 219 T/G +

@OHRI-bell-virt /scratch/temp_Test $ cp ABWUN.NUMALT1.SNP.bed ABWUN.NUMALT1.SNP.bed2

@OHRI-bell-virt /scratch/temp_Test $ wc -l ABWUN.NUMALT1.SNP.be*
  458044 ABWUN.NUMALT1.SNP.bed
  458044 ABWUN.NUMALT1.SNP.bed2
  916088 total

@OHRI-bell-virt /scratch/temp_Test $ multiIntersectBed -i ABWUN.NUMALT1.SNP.bed ABWUN.NUMALT1.SNP.bed2 > ABWUN.NUMALT1.SNP.multiIntersect.bed

@OHRI-bell-virt /scratch/temp_Test $ wc -l ABWUN.NUMALT1.SNP.multiIntersect.bed
402231 ABWUN.NUMALT1.SNP.multiIntersect.bed

@OHRI-bell-virt /scratch/temp_Test $ head ABWUN.NUMALT1.SNP.multiIntersect.bed
scaffold_1  54  56  2   1,2 1   1
scaffold_1  62  63  2   1,2 1   1
scaffold_1  103 104 2   1,2 1   1
scaffold_1  106 108 2   1,2 1   1
scaffold_1  124 125 2   1,2 1   1
scaffold_1  158 159 2   1,2 1   1
scaffold_1  162 163 2   1,2 1   1
scaffold_1  218 219 2   1,2 1   1
scaffold_1  225 226 2   1,2 1   1
scaffold_1  239 240 2   1,2 1   1

The fact that the merged file has less intervals (because it merged 106 to 107 and 107 to 108) is creating problems, because I want to compare amount of SNPs present in a given isolate compared to all. When I create a multiIntersect of all isolates and then use it to find out the proportion of SNPs in any given isolate present in at least 1 or 2, I sometimes get more intervals than SNPs in an isolate.

Any idea how to compare with multiIntersect but not merge intervals? Thanks, Adrian

SNP bedtools intersect • 1.6k views
ADD COMMENT
1
Entering edit mode
7.9 years ago

You can use bedops and bedmap --count to filter N BED files for at least X mutual exact overlaps:

$ bedops --everything file1.bed file2.bed file3.bed .. fileN.bed \
    | bedmap --count --echo --exact --delim '\t' - \
    | awk -voverlaps=${X} '$1 >= overlaps' \
    | cut -f2- \
    > answer.bed

Set $X to 1, 2, etc. Note that, when counting overlaps in a multiset union, an element will always overlap itself at least once, because it will show up in the multiset union at least once. So a threshold of 2 or more is probably more useful for this counting exercise, generally, unless you want to count everything.

ADD COMMENT

Login before adding your answer.

Traffic: 1865 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6