Question

extract reads ID for each region included in a bed file and create a separate file

0

Entering edit mode

2.8 years ago

pingu77 ▴ 20

Hi all,

I have a bam file and a bed file with several regions of interest to me. I would like to get the sequence IDs of the reads that fall in each region included in the bed file. I was able to get the sequence ID, chr and start coordinate for each read, using this command:

samtools view file.bam | cut -f1,3,4

An example of what I get in output:

readA   chr1    3985548  
readB   chr1    48349456 
readC   chr1    3985568 
readD   chr1    4932093

I would like to create for each region in the bed file a different txt file that includes the sequence IDs that map in a given region.

example of the top 5 lines included in my bed file:

chr1    3985538 3985593 chr1:3985538-3985593
chr1    3986091 3986158 chr1:3986091-3986158
chr1    4834946 4834972 chr1:4834946-4834972
chr1    4932073 4932105 chr1:4932073-4932105
chr1    7398192 7398238 chr1:7398192-7398238

example of the desired output:

Region: chr1:3985538-3985593
file -> chr1:3985538-3985593.txt
readA
readC

Region: chr1:4834946-4834972
file -> chr1:4834946-4834972.txt
readB

Region: chr1:4932073-4932105
file -> chr1:4932073-4932105.txt
readD

So how can I achieve the same? Any help is highly appreciated.

Thank you!

samtools bam bed • 945 views

ADD COMMENT • link updated 19 months ago by Ram 44k • written 2.8 years ago by pingu77 ▴ 20

0

Entering edit mode

hi pingu77,

what have you tried/searched for so far? this question has been asked quite a number of times here already. Also some googling will point you to the right direction.

If you are stuck at any point you can ask here for specific help.

ADD REPLY • link 2.8 years ago by lieven.sterck 15k

0

Entering edit mode

Hi! Yes, I tried to find an answer on google/forum. However, I couldn't find the answer to my problem. In the questions asked, people usually have a specific genomic region, in my case I have 60 000 regions in the same bed file and I would like to create 60 000 txt file reporting the read ID that map to the considered region..

ADD REPLY • link 2.8 years ago by pingu77 ▴ 20

0

Entering edit mode

use a loop over each bed. https://www.cyberciti.biz/faq/bash-for-loop/

ADD REPLY • link 2.8 years ago by Pierre Lindenbaum 164k

score 3 · Accepted Answer · 2022-03-24

3

Entering edit mode

2.8 years ago

Pierre Lindenbaum 164k

in my case I have 60 000 regions in the same bed

awk '{printf("%s:%d-%s\n",$1,int($2)+1,$3);}' in.bed | while read R
do
    #blablabla
done

ADD COMMENT • link 2.8 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Thanks a lot Pierre Lindenbaum ! it is exactly what I needed. My solution based on your hint:

awk '{printf("%s:%d-%s\n",$1,int($2)+1,$3);}' file.bed | while read R; 
do samtools view file.bam $R | cut -f1 > $R'.txt' 
done

ADD REPLY • link 2.8 years ago by pingu77 ▴ 20