How to extract the same mapped region by multiple samples ?
2
0
Entering edit mode
6.1 years ago
Picasa ▴ 650

Hello,

I have map 3 samples against a genome and get a bam file for each.

Now I would like to find all the regions that have been mapped by my 3 samples, is there a way to do this ?

Thanks for your help.

bam samples map • 2.3k views
ADD COMMENT
0
Entering edit mode

s that have been mapped by my 3 samples

what does it mean ? there is one bam per sample isn't it ?

ADD REPLY
0
Entering edit mode

Yes there is one bam per sample.

My idea is to merge them with samtools and then use a program (? this is what I called for help) that extract the regions that have been mapped in common between the 3 samples.

ADD REPLY
0
Entering edit mode

the regions that have been mapped in common between the 3 samples.

And what does that mean?

ADD REPLY
6
Entering edit mode
6.1 years ago

use samtools depth to get the regions mapped in all your bams , use awk to retain the regions where all the bam have Depth>0 and convert to bed, use bedtools merge to merge the bed records.

$ samtools depth S1.bam S2.bam S3.bam |\
awk -F '\t' '{if(int($3)>0 && int($4)>0 && int($5)>0) printf("%s\t%d\t%d\n",$1,int($2)-1,$2);}' |\
bedtools merge > comm.bed

$ cat comm.bed
RF01    10  3295
RF02    28  2652
RF03    3   265
RF03    274 2588
RF04    21  2342
RF05    15  1542
RF06    31  1248
RF06    1252    1329
RF07    5   1064
RF08    3   1049
RF09    54  1030
RF10    6   361
RF10    397 740
RF11    0   272
RF11    390 663
ADD COMMENT
0
Entering edit mode

Thanks Pierre, can you explain this ?

int($2)-1
ADD REPLY
0
Entering edit mode

samtools produces some 1-based positions column 2 while bed format is 0-based. Hence, to convert to bed you'll have to substract 1 to the integer value of column 2

ADD REPLY
0
Entering edit mode

Thanks Pierre, your command was very useful.

I am wondering if you know a way to extract regions of these "comm.bed" but relative to my 3 samples (S1.bam S2.bam S3.bam)

So for instance :

RF01    10  3295

is the coordinates in the reference genome

but I would like the coordinates in my sample S1.bam (name of reads and position) corresponding to this region.

ADD REPLY
0
Entering edit mode

convert this line to BED e.g: "REF01 9 10" ans use samtools view with option -L

ADD REPLY
0
Entering edit mode

but it is already in bed format no ?

ADD REPLY
0
Entering edit mode

output of samtools depth CHROM:POS-1-based: DEPTH

bed is : CHROM:start-0-based:end-0-based

ADD REPLY
0
Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLY
1
Entering edit mode
6.1 years ago

Using BEDOPS bedops --merge, simply:

$ bedops --merge <(bam2bed < A.bam) <(bam2bed < B.bam) <(bam2bed < C.bam) > answer.bed

This uses three BED streams as an example, but you can merge as many BED files as you want, in case you have an arbitrary number of BAM files to convert and merge.

ADD COMMENT
0
Entering edit mode

Thanks, but Pierre method is what I was searching for.

Your method output regions where at least 2 samples mapped while I wanted all the samples

ADD REPLY

Login before adding your answer.

Traffic: 2220 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6