Hello,
I have map 3 samples against a genome and get a bam file for each.
Now I would like to find all the regions that have been mapped by my 3 samples, is there a way to do this ?
Thanks for your help.
Hello,
I have map 3 samples against a genome and get a bam file for each.
Now I would like to find all the regions that have been mapped by my 3 samples, is there a way to do this ?
Thanks for your help.
use samtools depth
to get the regions mapped in all your bams , use awk
to retain the regions where all the bam have Depth>0
and convert to bed, use bedtools merge
to merge the bed records.
$ samtools depth S1.bam S2.bam S3.bam |\
awk -F '\t' '{if(int($3)>0 && int($4)>0 && int($5)>0) printf("%s\t%d\t%d\n",$1,int($2)-1,$2);}' |\
bedtools merge > comm.bed
$ cat comm.bed
RF01 10 3295
RF02 28 2652
RF03 3 265
RF03 274 2588
RF04 21 2342
RF05 15 1542
RF06 31 1248
RF06 1252 1329
RF07 5 1064
RF08 3 1049
RF09 54 1030
RF10 6 361
RF10 397 740
RF11 0 272
RF11 390 663
Thanks Pierre, your command was very useful.
I am wondering if you know a way to extract regions of these "comm.bed" but relative to my 3 samples (S1.bam S2.bam S3.bam)
So for instance :
RF01 10 3295
is the coordinates in the reference genome
but I would like the coordinates in my sample S1.bam (name of reads and position) corresponding to this region.
Using BEDOPS bedops --merge
, simply:
$ bedops --merge <(bam2bed < A.bam) <(bam2bed < B.bam) <(bam2bed < C.bam) > answer.bed
This uses three BED streams as an example, but you can merge as many BED files as you want, in case you have an arbitrary number of BAM files to convert and merge.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
what does it mean ? there is one bam per sample isn't it ?
Yes there is one bam per sample.
My idea is to merge them with samtools and then use a program (? this is what I called for help) that extract the regions that have been mapped in common between the 3 samples.
And what does that mean?