How to Extract Specific Region on Bam file

0

Entering edit mode

3.4 years ago

santos48 ▴ 40

I aligned all my Nanopore Rna-Seq reads with Human Reference Genome and Transcript (Then I had 2 different bam files one of them is aligned with Reference Genome other is aligned with Transcript)

I download reference files from:https://www.ncbi.nlm.nih.gov/genome/guide/human/

I want to extract chromosome X from the aligned BAM file. I tried extract region with samtools view command but it did not work I taken the error specifies an invalid region or unknown reference. Continue anyway.'

I used to check chromosome list with samtools view -H input.bam then the list for both bam file(transcript and genome)

@SQ SN:XR_928432.2 LN:810

Can you tell me what am I doing wrong ? and How to extract that region?

Nanopore HG38 Rna-Seq Genome • 6.9k views

ADD COMMENT • link updated 3.4 years ago by Kevin Blighe 88k • written 3.4 years ago by santos48 ▴ 40

0

Entering edit mode

. I tried extract region with samtools view command but it did not work

what was the command ???

ADD REPLY • link 3.4 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

samtools view -h input.bam X:123456-654321 -o output.bam

ADD REPLY • link 3.4 years ago by santos48 ▴ 40

0

Entering edit mode

your parameter order is wrong, samtools expects:

samtools view [PARAMS] BAM REGION

so you need:

samtools view -h -o output.bam input.bam X:123456-654321

ADD REPLY • link 3.4 years ago by JC 13k

0

Entering edit mode

Thank you for the answer but it is not working, I have taken the same error

specifies an invalid region or unknown reference. Continue anyway.

ADD REPLY • link 3.4 years ago by santos48 ▴ 40

1

Entering edit mode

check if chromosome X is labeled as X or chrX

ADD REPLY • link 3.4 years ago by JC 13k

0

Entering edit mode

I checked with samtools view -H input.bam with that command but I did not see any chromosome I downloaded ref gene from that link: https://www.ncbi.nlm.nih.gov/genome/guide/human/

I saw only like

@SQ SN:NT_187361.1 LN:175055

ADD REPLY • link 3.4 years ago by santos48 ▴ 40

0

Entering edit mode

Check your reference that you aligned to for the chromosome names.

You can ussally use grep for it

grep ">" reference.fa

ADD REPLY • link 3.4 years ago by Frogs • 0

0

Entering edit mode

Thank you for learned to me grep command, I check it, and I see all chromosomes in there. But I can not extract from bam file I am exactly sure I aligned with the correct file.

ADD REPLY • link 3.4 years ago by santos48 ▴ 40

0

Entering edit mode

Ok, and when you do

samtools view -h -o output.bam input.bam X:123456-654321

Does the X in the above command match what the X chromosome is called in the output of the grep command?

You need to make sure the samtools X: is the same as what X is called in the grep command output.

ADD REPLY • link 3.4 years ago by Frogs • 0

0

Entering edit mode

In grep command its output like(genomic ref);

> NT_187634.1 Homo sapiens chromosome X genomic scaffold, GRCh38.p13 alternate locus group ALT_REF_LOCI_1 HSCHRX_1_CTG3

You are right, I think it's output different from X:123456-654321 then and I didn't see any location as a number. In that case how to extract that region?

I am confused should I align with the reference file from the reading, I did but I can not visualize to look chrX.

ADD REPLY • link updated 3.4 years ago by Ram 44k • written 3.4 years ago by santos48 ▴ 40

0

Entering edit mode

Sure, please try this:

samtools view -h -o output.bam input.bam NT_187634.1:123456:654321

ADD REPLY • link 3.4 years ago by Frogs • 0

0

Entering edit mode

It will not solve the problem because it has many different kinds of 'NT_ '

ADD REPLY • link 3.4 years ago by santos48 ▴ 40

0

Entering edit mode

In your reference should be one NT_ sequence for the X chromosome, the other NT_ sequences are the other chromomosomes (1,2,3...22, Y, MT)