Hello,
I have cases when there are SNPs called adjacent to each other, ex:
scaffold_1 106 107 C/A +
scaffold_1 107 108 G/A +
This is probably because this is not an SNP but an MNP (multiple nucleotide polymorphism?) where more than one bp changes at a time. Regardless, I would like to treat these as SNPs for the time being because they aren't that many and are not affecting my results.
I want to compare different isolate with multiIntersectBed, but the problem I am getting is that bedttools is merging such intervals, creating a common one 106 to 108, which really screws up the analysis. To show you what I mean:
@OHRI-bell-virt /scratch/temp_Test $ head ABWUN.NUMALT1.SNP.bed
scaffold_1 54 55 A/T +
scaffold_1 55 56 G/T +
scaffold_1 62 63 G/A +
scaffold_1 103 104 T/C +
scaffold_1 106 107 C/A +
scaffold_1 107 108 G/A +
scaffold_1 124 125 G/T +
scaffold_1 158 159 A/T +
scaffold_1 162 163 T/C +
scaffold_1 218 219 T/G +
@OHRI-bell-virt /scratch/temp_Test $ cp ABWUN.NUMALT1.SNP.bed ABWUN.NUMALT1.SNP.bed2
@OHRI-bell-virt /scratch/temp_Test $ wc -l ABWUN.NUMALT1.SNP.be*
458044 ABWUN.NUMALT1.SNP.bed
458044 ABWUN.NUMALT1.SNP.bed2
916088 total
@OHRI-bell-virt /scratch/temp_Test $ multiIntersectBed -i ABWUN.NUMALT1.SNP.bed ABWUN.NUMALT1.SNP.bed2 > ABWUN.NUMALT1.SNP.multiIntersect.bed
@OHRI-bell-virt /scratch/temp_Test $ wc -l ABWUN.NUMALT1.SNP.multiIntersect.bed
402231 ABWUN.NUMALT1.SNP.multiIntersect.bed
@OHRI-bell-virt /scratch/temp_Test $ head ABWUN.NUMALT1.SNP.multiIntersect.bed
scaffold_1 54 56 2 1,2 1 1
scaffold_1 62 63 2 1,2 1 1
scaffold_1 103 104 2 1,2 1 1
scaffold_1 106 108 2 1,2 1 1
scaffold_1 124 125 2 1,2 1 1
scaffold_1 158 159 2 1,2 1 1
scaffold_1 162 163 2 1,2 1 1
scaffold_1 218 219 2 1,2 1 1
scaffold_1 225 226 2 1,2 1 1
scaffold_1 239 240 2 1,2 1 1
The fact that the merged file has less intervals (because it merged 106 to 107 and 107 to 108) is creating problems, because I want to compare amount of SNPs present in a given isolate compared to all. When I create a multiIntersect of all isolates and then use it to find out the proportion of SNPs in any given isolate present in at least 1 or 2, I sometimes get more intervals than SNPs in an isolate.
Any idea how to compare with multiIntersect but not merge intervals? Thanks, Adrian