Extracting reads from given region
1
0
Entering edit mode
7.0 years ago
KVC_bioinfo ▴ 600

Hello all,

I am extracting the reads from bam file which are falling in particular region using the following command.

samtools view -h input.bam "Chr10:18000-45500" > output.bam

However, now I want to extract the reads in sam manner but my reference is human genome primary assembly where the name of fasta reads starts like:

NC_000001.10 Homo sapiens chromosome 1, GRCh37.p13 Primary Assembly

How do I now modify command to do the sam task?

Thank you in advance.

bam • 2.6k views
ADD COMMENT
3
Entering edit mode
7.0 years ago

the fasta should be indexed ignoring everything after the first whitespace. So your command should be something like:

samtools view -h input.bam "NC_000001.10:123-456" > output.sam
ADD COMMENT
0
Entering edit mode

Thank you. However, I want to extract reads from Chr 10 with the given coordinates above.

ADD REPLY
1
Entering edit mode

try looking for some thing like: NC_000010 in header:) $ grep ">" <input.fasta> (if your fasta is gizpped, use zgrep instead of grep). You would get all the headers in reference file. Then look for an entry that starts with NC_000010. Use that to extract the information. Command would look like:

samtools view -h input.bam "NC_000010.xx:123-456" > output.sam
ADD REPLY

Login before adding your answer.

Traffic: 2977 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6