How to find the longest common sequence for a cluster of sequences in a fasta file using python?
2
0
Entering edit mode
8.8 years ago
grayapply2009 ▴ 300

I have a fasta file in which sequences are clustered and sorted by IDs. I want to find the longest sequence for each cluster and write them to a new file. How do I do it with python?

Here is the format of my fasta file:

>abc var1

kdfafaljflasjfalsjfaljfs

>abc var2

lasuowiejwaljflaj

>abc var3

lajflasjfowijflasjfopiefjjkfldfjqop

>dce var1

owiepqfpufaplddfpqoiwejlkdf

>dce var2

qopwelsmdfljfaldjfaopif

>red var1

alsdfowejfsladfjojflsdfjsdfjaslfjk

>red var2

lsdfjjqowjelsaflasflfnkdaflasfj

>red var3

kahfiqwuefkasdnkashdfiqfkasjdfh

>red var4

akhqioweadhauisydklsdfksdyiofjasldfhihladfni

common fasta phthon longest • 3.3k views
ADD COMMENT
1
Entering edit mode
8.8 years ago
dbrowne.up ▴ 80

Check out the Python module pyfaidx: https://github.com/mdshw5/pyfaidx

It makes doing this sort of thing super easy. You may have to experiment a bit to figure out how to do exactly what you are wanting to, but with pyfaidx, you have a nice interface to access each sequence in your file and get information about each sequence, i.e. length, name, etc.

ADD COMMENT
0
Entering edit mode

It looks like a lot of work. I'm trying it. Thank you for your advice.

ADD REPLY
0
Entering edit mode

pyfaidx will not work on this type of FASTA because the indexing process splits each sequence name on whitespace, so you'd end up with non-unique identifiers. This was a design decision to match the samtools behavior.

ADD REPLY
1
Entering edit mode

Thanks for pointing it out, Matt. I noticed that too. However, the integrated faidx commandline tool is really handy for doing other things with your fasta file.

ADD REPLY
1
Entering edit mode
8.8 years ago
grayapply2009 ▴ 300

Hey folks,

I found a solution from another post. Here is the link for those who are in the same boat with me.

How to extract the longest isoform from multi fasta file

ADD COMMENT

Login before adding your answer.

Traffic: 1994 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6