I am handling a protein sequence file in phylip format using Python.
5 592
Homo_sapie MEMQDLTSPH SRLSGSSESP SGPKLGNSHI NSNSMTPNGT EVKTEPMSSS
Macaca_mul MEMQDLTSPH SRLSGSSESP SGPKLDNSHI NSNSMTPNGT EVKTEPMSSS
Mus_muscul MEMQDLTSPH SRLSGSSESP SGPKLDSSHI NSTSMTPNGT EVKTEPMSSS
Danio_reri ---------- ---------- ---------- ---------M SWILMWSLLS
Ciona_inte ---------- ---------- ---------- ------MLFS VYIVMMIVTS
ETASTTADGS LNNFSGSAIG SSSFSPRPTH QFSPPQIYPS NRPYPHILPT
ETASTTADGS LDNFSGSAIG SSNFSPRPTH QS-PPQIYAS NRPYPHILPT
EIASTAADGS LDSFSGSALG SSSFSPRPAH PFSPPQIYPS -KSYPHILPT
ACAPQIHSAS AQDSSNLLST EEPITPQPYN RSQYCQWPCK CPKTPPMCPP
QFYLSMATPN FDLRRSNQST EGDFYPARS- EARECQD-CT CPDTPGTCPP
PSSQTMAAYG QTQFTTGMQQ ATAYATYPQP GQPYGISSYG ALWAGIKTEG
PSSQTMAAYG QTQFTTGMQQ ATAYATYPQP GQPYGISSYG ALWAGIKTEG
PSSQTMAAYG QTQFTTGMQQ ATAYATYPQP GQPYGISSYG ALWAGIKTES
GVSLLMDG-- -----CDCCR ACAKQVREAC NEKENCDHHR GLYCDYSADK
GVSRIMDG-- -----CDCCK MCAKQLNEPC DVRMRCDHHK GLYCDMNT--
GLSQSQSPGQ TGFLSYGTSF STPQPGQAPY SYQMQGSSFT TSSGIYTGNN
GLSQSQSPGQ TGFLSYGTSF STPQPGQAPY SYQMQGLSFT TSSGLYTGNN
GLSQSQSPGQ TGFLSYGTSF GTPQPGQAPY SYQMQGSSFT TSSGLYSGNN
---------- ---------- ---------P RYEKGVCAFL PGTGCEHNGV
---------- ---------- ---------- ----GLCKAS PGVACYVGGS
I need to read the file in a list such that each index contains a column wise from the sequence. Until now I have just succeeded in getting the rows of file but I need the sequences to be extracted as columns of sequence. for example, for the above file I need output as each column to be one index of the list,
M
M
M
-
-
try phylip to fasta converter (such as bbmap (standalone) or alignio (biopython library)). Then you can access the sequences as index.
Actually I need to work on phylip format.
You can try logic something like this.
I have just tried with alignment which you have provided in your question.
Add conditional statements to handle other things like blank space and all.
Thanks, I have tried using this logic and I ended up with this error:
Hello mdsiddra,
Have you checked indentation for that line?
Yes , I have checked. it still gives the same error. I needed to access each column of sequence file as one index of the list so that I can use each index for further calculation.
Here we have to handle length of line and indexing we can solve it by adding if statement,
Thankyou for the help.!