locating positions of a Fasta seq in bam file

0

Entering edit mode

5.7 years ago

ahmedakhokhar ▴ 150

Hi, I am new to whole-genome alignments and sequencing (and this naive question might have asked before and answer several times, my humble apologies). I have a sequence in a .fasta file (length ~3kb) without its genomic coordinates; all I want to know is that:

what is the genomic location of this .fasta seq in the .bam file?, (and retrieve its genome coordinates from the bam file),

Can someone suggest a tool or steps that are involved in such minor data processing? Thanks

Assembly • 1.2k views

ADD COMMENT • link 5.7 years ago by ahmedakhokhar ▴ 150

0

Entering edit mode

Does the BAM file contain the alignment of your fasta file to something else (presumably another fasta file)?

ADD REPLY • link 5.7 years ago by Devon Ryan 105k

0

Entering edit mode

Yes, the bam file was generated from the usual genome sequencing pipeline and it should contain this .fasta seq too. I just want to know where .fasta seq present in the aligned sequence. (genomic coordinates of my .fasta seq in the bam alignment file)

ADD REPLY • link 5.7 years ago by ahmedakhokhar ▴ 150

1

Entering edit mode

Then use samtools view to get the starting position and calculate from the CIGAR string where the end position is.

Xref: https://bioinformatics.stackexchange.com/questions/10753/locating-positions-of-a-fasta-seq-in-bam-file

ADD REPLY • link 5.7 years ago by Devon Ryan 105k

0

Entering edit mode

samtools view [options] in.sam|in.bam|in.cram [region...]

samtools view; takes [region] coordinates, which I don't have for this seq. This is the problem I am facing. any tips?

ADD REPLY • link 5.7 years ago by ahmedakhokhar ▴ 150

0

Entering edit mode

You don't need the region, just pipe the output to grep and grep for the contig name.

ADD REPLY • link 5.7 years ago by Devon Ryan 105k

Login before adding your answer.