Hi all,
Is there a way to extract the corresponding target sequence from an bam or SAM alignment produced by novoalign software? or bowtie?
Thanks
Geneart.
Hi all,
Is there a way to extract the corresponding target sequence from an bam or SAM alignment produced by novoalign software? or bowtie?
Thanks
Geneart.
#index the bam file first
samtools index test.bam
samtools view test.bam chr1:200000-500000
or have a look at tabix (for sam file) http://samtools.sourceforge.net/tabix.shtml
Hi tangming2005
Thanks for the reply . I was looking into tabix earlier, the reason I posted this question is because, I am not really interested in the coordinates of the reference region where my sequence maps to, but I would like to retrieve the corresponding sequence itself ( string of ATGCs) that my reads maps onto. SAM files does give the number of mismatch/matches of our reads to the refrence sequence but I was looking to extract the actual reference sequence region where my read maps to. SO it is slightly different. But then very much appreciate your reply :) Thanks again,
I guess I can still take the coordinates generated this way and extract sequence from my genome file perhaps?
Geneart.
There are many sequences that will represent your region of interest but if you want to get a single consensus sequence, then you should read more about pileup2fq. The old pileup feature in samtools could create one for you. You can do the same with new mpileup but there is no pileup2fq like feature.
sure you can get the coordinates and convert to fasta sequences. See one of my post here
He is not talking about extracting sequence from reference fasta file. He has a bam file and it may have a lot of variants. He wants to build a fasta sequence that represent the sequence with variants in them. BTW, I just checked your post and you have not mentioned about samtools faidx
as a solution.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Duplicate of Extract Reads From A Bam File That Fall Within A Given Region