extracting spesific sequences from bam or bam.bai file using samtools
1
1
Entering edit mode
6.6 years ago

im using samtool to extract specific regions from the sorted bam file/bam.bai file. e.g if i want to extract the sequence from 329729 to 329840 of chromosome one, then how would i proceed??? the command that im using is

./samtools view alignment.sorted.bam gnl|BGIA2|CA_chr1:329729-329840

and im getting following error:

-bash: BGIA2: command not found
-bash: CA_chr1:329729-329840: command not found
[main_samview] region "gnl" specifies an unknown reference name. Continue anyway.

while the "gnl|BGIA2|CA_chr1" could be found in bam file if is viewed by head command.

further how to extract the multiple sequences with the similar formate in separate file using samtool or any other tool??? thanks

next-gen • 2.5k views
ADD COMMENT
3
Entering edit mode
6.6 years ago

Hello,

the problem is the | in the name. This is interpreted as a pipe to stream the output to the next program, Try to escape this symbol by \ (Didn't test it)

./samtools view alignment.sorted.bam gnl\|BGIA2\|CA_chr1:329729-329840

Another way could be to create a bed file with the coordinates and use the -L parameter.

fin swimmer

ADD COMMENT
0
Entering edit mode

thank you so much for kind help. it has worked to print the reads of required range. but can you plz kindly guid me in following aspects too:

1) I need this out put in separate file rather that it just get printed on terminal. how to get it in separate file??

2) this is example when i need to get reads with only one sequence range. what if i have large number of queries like this. i mean if i have one file with the ranges like:

;gnl|BGIA2|CA_chr1:429729-529840

gnl|BGIA2|CA_chr1:629729-726840

gnl|BGIA2|CA_chr1:729729-929840

gnl|BGIA2|CA_chr1:529729-729840

................ so on.........

then how to get all the reads with these coordinate references in separate output file????

ADD REPLY
0
Entering edit mode

1) I need this out put in separate file rather that it just get printed on terminal. how to get it in separate file??

You can do this by appending a > resultfile.sam to your command. > redirects the output to the given file. But be aware that if you already have a file with this name, the content will be deleted.

2) this is example when i need to get reads with only one sequence range. what if i have large number of queries like this. i mean if i have one file with the ranges like:

Should there be a seperate output file for each region or just one with all reads in?

For the last case I would suggest, you convert your file containing the ranges to a bed file and uses this with samtools view -L

 awk -v OFS="\t" '{n=split($0,a,/[:\-]/); print a[1],a[2]-1,a[3],a[0]}' ranges.txt > ranges.bed

And than:

samtools view -L ranges.bed > reads.sam

fin swimmer

ADD REPLY

Login before adding your answer.

Traffic: 1966 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6