Question

analyzing a read mapped file with python

0

Entering edit mode

2.4 years ago

Manaswini • 0

hii.. I have a file containing reads aligned to the reference genome in fasta format. I want to identify the read-aligned position in the reference genome through python. is there any module that can read this kind of file and return me the aligned positions??

alignment python reads • 2.9k views

ADD COMMENT • link updated 2.4 years ago by barslmn ★ 2.4k • written 2.4 years ago by Manaswini • 0

0

Entering edit mode

You probably have a bam file. If you want a python module you can look at pysam.

ADD REPLY • link 2.4 years ago by barslmn ★ 2.4k

0

Entering edit mode

thanks for the reply. but I do not have a bam file. I have a fasta file. I have attached a screenshot. I want to find out the position where the reads aligned to the reference genome.

ADD REPLY • link 2.4 years ago by Manaswini • 0

0

Entering edit mode

This is not in fasta format.

ADD REPLY • link 2.4 years ago by barslmn ★ 2.4k

0

Entering edit mode

if you can post an example of the file, we can help you better

ADD REPLY • link 2.4 years ago by Ming Tommy Tang ★ 4.6k

0

Entering edit mode

thanks for the reply, actually, it's an aligned file. I am unable to upload the file here so adding a screenshot of the file enter image description here

ADD REPLY • link 2.4 years ago by Manaswini • 0

1

Entering edit mode

SeqIO or AlignIO from Biopython can help you in parsing fasta files. In your case, it looks like all you need to do is count the number of '-' characters before the sequence to determine the mapping position.
I should note that this is rather an unusual format for storing such data, so maybe you should reconsider the procedure that produced this file.

ADD REPLY • link 2.4 years ago by liorglic ★ 1.5k

0

Entering edit mode

ok thank you for the help

ADD REPLY • link 2.4 years ago by Manaswini • 0

score 0 · Answer 1 · 2023-01-02

0

Entering edit mode

2.4 years ago

colindaven 7.6k

I think you have gone the route of multiple sequence alignment, which is commonly used for gene or protein sequences. Did you use eg muscle or clustalomega for your (global) alignments ?

If using a reference genome, you should be aiming for local alignments. Use well known tools such as bwa mem for mapping versus a ref genome, and aim for BAM files, which give you the alignment position. There are many good tutorials for this.

ADD COMMENT • link 2.4 years ago by colindaven 7.6k

0

Entering edit mode

thank you for your reply. now I am trying with BWA. but can I directly use BWA through python?? or do I have to use the locally installed one ???

ADD REPLY • link 2.4 years ago by Manaswini • 0

0

Entering edit mode

You can use the bwa over the command line. https://bio-bwa.sourceforge.net/bwa.shtml

ADD REPLY • link 2.4 years ago by barslmn ★ 2.4k