locating positions of a Fasta seq in bam file
0
0
Entering edit mode
5.1 years ago
ahmedakhokhar ▴ 150

Hi, I am new to whole-genome alignments and sequencing (and this naive question might have asked before and answer several times, my humble apologies). I have a sequence in a .fasta file (length ~3kb) without its genomic coordinates; all I want to know is that:

what is the genomic location of this .fasta seq in the .bam file?, (and retrieve its genome coordinates from the bam file),

Can someone suggest a tool or steps that are involved in such minor data processing? Thanks

Assembly • 1.0k views
ADD COMMENT
0
Entering edit mode

Does the BAM file contain the alignment of your fasta file to something else (presumably another fasta file)?

ADD REPLY
0
Entering edit mode

Yes, the bam file was generated from the usual genome sequencing pipeline and it should contain this .fasta seq too. I just want to know where .fasta seq present in the aligned sequence. (genomic coordinates of my .fasta seq in the bam alignment file)

ADD REPLY
1
Entering edit mode

Then use samtools view to get the starting position and calculate from the CIGAR string where the end position is.

Xref: https://bioinformatics.stackexchange.com/questions/10753/locating-positions-of-a-fasta-seq-in-bam-file

ADD REPLY
0
Entering edit mode

samtools view [options] in.sam|in.bam|in.cram [region...]

samtools view; takes [region] coordinates, which I don't have for this seq. This is the problem I am facing. any tips?

ADD REPLY
0
Entering edit mode

You don't need the region, just pipe the output to grep and grep for the contig name.

ADD REPLY

Login before adding your answer.

Traffic: 2131 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6