Entering edit mode
5.1 years ago
ahmedakhokhar
▴
150
Hi, I am new to whole-genome alignments and sequencing (and this naive question might have asked before and answer several times, my humble apologies). I have a sequence in a .fasta file (length ~3kb) without its genomic coordinates; all I want to know is that:
what is the genomic location of this .fasta seq in the .bam file?, (and retrieve its genome coordinates from the bam file),
Can someone suggest a tool or steps that are involved in such minor data processing? Thanks
Does the BAM file contain the alignment of your fasta file to something else (presumably another fasta file)?
Yes, the bam file was generated from the usual genome sequencing pipeline and it should contain this .fasta seq too. I just want to know where .fasta seq present in the aligned sequence. (genomic coordinates of my .fasta seq in the bam alignment file)
Then use
samtools view
to get the starting position and calculate from the CIGAR string where the end position is.Xref: https://bioinformatics.stackexchange.com/questions/10753/locating-positions-of-a-fasta-seq-in-bam-file
samtools view [options] in.sam|in.bam|in.cram [region...]
samtools view; takes [region] coordinates, which I don't have for this seq. This is the problem I am facing. any tips?
You don't need the region, just pipe the output to grep and grep for the contig name.