Entering edit mode
2.3 years ago
ngarber
▴
60
I'm fairly new to sequence analysis in Python, but what I want to do is:
- Take a string (15aa peptide sequence) and find the best alignment (no gaps) by aligning against another string (a protein sequence)
- Get the best-aligned matching 15aa sequence as a new string - must be 15aa with no gaps
I'm used to doing that with COBALT on the web interface, but I'm not sure how to do that from within Python - is there a way to do it in BioPython or the command line (i.e. with os)?
As a starting point for your last couple of clauses, you might want to look at the code and examples in a series of recent exchanges here and here using Biopython. Fortunately, your case is easier than that because you don't want gaps. So you can add a condition to filter all the returned alignments so that all of them are equal to the length of the input string.
I personally haven't used Cobalt, but the abstract published describing it says there's files available you can run and the README at the listed FTP site describes how to run it on Linux. If you have your heart set on using it in conjunction with Python, you probably could check out how I did a similar thing with Patmatch here. Go there, click
launch binder
and work through the notebooks to see an example of linking a command line program to Python various ways in using Python running in Jupyter. (Note this doesn't cover all the ways you can do this, such as subprocess oros.system()
, but may be a good start to consider options.) You'd probably want to check out the 'Advanced: Sending PatMatch output directly to Python' one under 'Additional topics' as well.I don't think this is a comment. It should be a bona fide answer.