I am using python (3.6)/biopython(1.72) to read sequence files. I have an aligned sequence file in fasta format.
>Human
----------------------------MRLRVRLLKRTWPLEVPETEPTL-RSHLRQSLLCT-IPSSTDSEHSSLQN-NEQPSL
>Chimpanzee
----------------------------MRLRVRLLKRTWPLEVPETEPTL-RSRLRQSLLCT-IPSSTDSEHSSLQN-NEQPSL
>Dog
----------------------------MKLRVRLQKRTWPLDLPDAEPTL-RAHLSQALLPS-LPSSTDSEHSSLQN-NDPPSL
>Mouse
----------------------------MKLRVRLQKRTQPLEVPESEPTL-RAHLSQVLLPT-LPSSTDTEHSSLQD-NDQPSL
I need to remove the gaps '-'
from the file and have the result file like this:
>Human
MRLRVRLLKRTWPLEVPETEPTLRSHLRQSLLCTIPSSTDSEHSSLQNNEQPSL
>Chimpanzee
MRLRVRLLKRTWPLEVPETEPTLRSRLRQSLLCTIPSSTDSEHSSLQNNEQPSL
>Dog
MKLRVRLQKRTWPLDLPDAEPTLRAHLSQALLPSLPSSTDSEHSSLQNNDPPSL
>Mouse
MKLRVRLQKRTQPLEVPESEPTLRAHLSQVLLPTLPSSTDTEHSSLQDNDQPSL
I have been trying this using python:
file_var = input ("Enter your file name: ")
sequences = []
for seq_record in SeqIO.parse(file_var, "fasta"):
sequences.append(seq_record.seq)
print (sequences)
list2 = [] # list for extracting "-"
list3 = [] # list for sequence without "-"
for seq_record in alignment:
if "-" in alignment:
list2.append(seq_record)
else:
list3.append(seq_record)
But this outputs me the error:
raise NotImplementedError(_NO_SEQRECORD_COMPARISON)
NotImplementedError: SeqRecord comparison is deliberately not implemented. Explicitly compare the attributes of interest.
Can I have any suggestions?? (P.S: I have been working with sequence file using windows OS, not linux)
you should be using
ungap
function in biopython from here: https://biopython.org/DIST/docs/api/Bio.Seq.Seq-class.html. some thing likeseq.ungap()
assuming thatseq
object holds the sequence.output (input is from OP fasta):
Hello mdsiddra!
It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/4846/removing-sequences-gaps-from-fasta-file
This is typically not recommended as it runs the risk of annoying people in both communities.
Thankyou for bringing it to my notice, I will better be careful next time.
Hi, you can also use the "Undo alignment" function of the SEDA software (https://www.sing-group.org/seda/manual/operations.html#undo-alignment). Regards.
hello I want to remove columns that contain gaps in multiple sequence alignment like human ATGC-GTGC--- campanz GT-A-TTGC---
output: human ATCGTGC campanz GTATTGC
OK - and what about the solutions provided here don't resolve that for you?