Intersect Bam Files
4
2
Entering edit mode
13.0 years ago

Hi,

Is there a tool to intersect multiple bam files to extract only the alignment that are find in all the bam files ? And maybe is it possible to specify a percentage of similarity between the a alignment.

Example of alignment:

           ATGGCGACGACTACGAGACAGCATAGCAAACGACGA
Sample_A        ----------------------
Sample_B         ----------------------
Sample_C        ----------------------

This alignment will be reported because the alignment are very close.

Is it clear ?

Thanks,

N.

bam alignment • 6.8k views
ADD COMMENT
5
Entering edit mode
13.0 years ago
Farhat ★ 2.9k

For an intersection one way would be to convert your bam files to bed files using bamTobed from bedtools and then doing an intersection with multiIntersectBed (see Bedtools Compare Multiple Bed Files? ) If you want to specify parameters like percent similarity some coding on your end will be required.

ADD COMMENT
0
Entering edit mode

thanks ! that's perfect ! I will write a little script to parse the output of multiIntersectBed :)

ADD REPLY
0
Entering edit mode

If you're interested in an efficient solution once you convert to BED format (it appears you are willing to do this), take a look at an easy Bedtools Compare Multiple Bed Files? offered by BEDOPS. One of the design principles of BEDOPS is that it works efficiently with multiple input files at once.

ADD REPLY
1
Entering edit mode
13.0 years ago

The following command line is based on my 'beta' variation toolkit : http://code.google.com/p/variationtoolkit/

echo -e "SAMPLE1\t/path/to/SAMPLE1.vcf.gz\nSAMPLE2\t/path/tp/SAMPLE2.vcf.gz" |\ #print a list of SAMPLE-NAME and path to VCF
bin/scanvcf |\ #read the VCF files 
sort -t '   ' -k1,1 -k2,2n -k4,4 -k5,5 -k11 |\ #sort on CHROM/POS/REF/ALT/SAMPLE
bin/samplespersnp --sample 11 2> /dev/null |\ #add the number of SAMPLE having the variation
awk -F '    ' '($12=="2")' |\ #keep the variations HAVING count(SAMPLE)==2
bin/vcfttview -a -s 11 \
    -F SAMPLE1 /path/to/SAMPLE1.bam \
    -F SAMPLE2 /path/to/SAMPLE2.bam \
    -R /path/to/hg19.fa  #print a 'tview' of each variation . Sample column is 11

Result:

(...)
>>chr1:15211
> chr1:15211    /path/to/SAMPLE2.bam

15201     15211     15221     15231     15241     15251     15261     15271     
AGACAGCGGCTGTTTGAGGAGCCACCTCCCAGCCACCTCGGGGCCAGGGCCAGGGTGTGCAGCAccactNNNNNNNNNNN
..........G..........................................................           
,,,,,,,        ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,           
,,,,,,,,,,                                                                      
,,,,,,,,,,g,,,,,,,,,,,,,,,,,,                                                   
,,,,,,,,,,g,,,,,,,,,,,,,,,,,,,,,,,,,,,,                                         
,,,,,,,,,,g,,,g,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,                                   
,,,a,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,                              
  g,,,g,,,g,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,                        
      ,,,,g,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

> chr1:15211    /path/to/SAMPLE2.bam

15201     15211     15221     15231     15241     15251     15261     15271     
AGACAGCGGCTGTTTGAGGAGCCACCTCCCAGCCACCTCGGGGCCAGGGCCAGGGTGTGCAGCAccactgtacaatgggg
..........G..............................................................T......
,,,,                c,,c,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,t      
,,,,,                                                 ,,,,,,,,,,,,,,,,,,,t,,,,,,
,,,,,,,,,,,                                                 .............T......
,,,,,,,,,,g,,,,,,,                                             ..........T......
..........G.............                                                 t,,,,,,
..........G............C.                                                       
,,,,,,,,,,g,,,,,,,,,,,,,,,,,                                                    
,,,,,,,,,,g,,,,,,,,,,,,,,,,,,,,                                                 
,,,,,,,,,,g,,,,,,,,,,,,,,,,,,,,,,,,,,,,                                         
,,,,,,,,,,g,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,                                 
,,,,,,,,,,g,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,                              
(...)
>>chr1:752566
> chr1:752566   /path/to/SAMPLE1.bam

     752561    752571    752581    752591    752601    752611    752621         
caagagaaacgtttgacagagaatacagacaacctactgaatgagagaaactatttgcaaactatgcatctgacaaaggc
..........R.....................................................................
........................................  .........G............................
..........A...........................................    ......................
        ..A...................................................   ...............
                  ......................................................  ......
                                  ..........A....G..............................
                                   .............................................
                                     ...........................................
                                       .........................................
                                                      ..........................
                                                          ......................
                                                              ..................
                                                                  ..............
                                                                    ............
                                                                     .......C...
                                                                       .........
                                                                          ......
                                                                              ..

> chr1:752566   /path/to/SAMPLE2.bam

     752561    752571    752581    752591    752601    752611    752621         
caagagaaacgtttgacagagaatacagacaacctactgaatgagagaaactatttgcaaactatgcatctgacaaaggc
..........A.....................................................................
......                           ...............................................
..........                                   ...................................
..........A......                               ................................
..........A.....................                         .......................
..........A......................                                      .........
..........A.........................                                           .
(...)
ADD COMMENT
0
Entering edit mode
13.0 years ago

Actually it is not what I want. multiIntersectBed used the coverage. I want to check the read alignment.

Here's an example where multiIntersectBed does not work for my case :

                  ATGGCGACGACTACGAGACAGCATAGCAAACGACGA
Sample_A             ----------------------
Sample_B                       ----------------------
Sample_C              --------------------

multiIntersect_A  000111111111111111111111100000000000
multiIntersect_A  000000000000011111111111111111111110
multiIntersect_A  000011111111111111111111000000000000
Sum               000122222222233333333333211111111110

So I have a 12 nt long part (where sum==3) found in every sample. But When we look to the read alignment, we observed that the sample_B has not a good alignment compared to the sample A and C.

What I want to report is this type of alignment :

                  ATGGCGACGACTACGAGACAGCATAGCAAACGACGA
Sample_A             ----------------------
Sample_B             ---------------------
Sample_C              --------------------

So alignment that are very very close (1 ,2 or 3 nt overhang max )

Is it clear ?

Thanks,

N.

ADD COMMENT
0
Entering edit mode
13.0 years ago

I think you need some series of intersectBeds with a -f requirement (requiring the element in b to overlap the element in a by X percent).

Bedtools Compare Multiple Bed Files?

ADD COMMENT

Login before adding your answer.

Traffic: 1903 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6