Question

Guidance for key amino acid analysis

0

Entering edit mode

5.7 years ago

skbrimer ▴ 740

Hello hive brain,

We are working on an analysis at work were we are evaluating key amino acids at specific sites in a protein and I'm trying to figure out a way to speed up the analysis as this is likely something we will be doing for a while. Right now we take the processed fasta files and remove the areas of interest manual from an alignment. This is tiresome but not impossible, we currently are only working with a couple hundred fasta files so the analysis going well.

Is it possible to take the output of an alignment (clustal format) and extract the positions of interest for each sequence in the alignment? I would like to go this approach because not every pcr product is equal and I think that this would be the more robust method. The trouble with this is I have no idea how to manipulate a clustal output to give me only columns x,y,z.

The other possible answer I can think of and can code is for each pcr product give me positions x,y,z which is straight forward but requires a lot of assumptions (i.e. are they all the same length, do they start in the right spot, etc..)

If anyone can point me in a direction I would be grateful.

Thank you, Sean

alignment • 1.0k views

ADD COMMENT • link 5.7 years ago by skbrimer ▴ 740

score 1 · Accepted Answer · 2019-08-31

You can!!!!

I figured it out using biopython, seriously I can not thank them enough for making and maintaining this package.

you can take your file.aln and read it in using biopython's AlignIO from there you can read it in like this

align = AlignIO.read("test.aln","clustal")
print(align[:, 221:222]+align[:, 241:242]+align[:, 255:256]+align[:, 293:294]+align[:, 298:299])

and I get the output I wanted and I think I can write them individually to file if needed or just one file as well.

TFYEI tr|Q5MJR2|Q5MJR2_IBDV
TFYEI BA7880-15
TFYEI LS19-3325-108
TFYEI LS19-3325-110
---EI LS19-3325-113
TFYEI LS19-3325-134
TFYEI LS19-3325-135
TFYEI LS19-3325-136
TFYEI LS19-3325-137
TFYEI LS19-3325-138
TFYEI LS19-3325-75
TFYEI LS19-3325-76
TFYEI LS19-3325-83
TFYEI LS19-3325-87
TFYEI LS19-3325-97
TFYEI LS19-3353-109
TFYEI LS19-3353-116
TFYEI LS19-3353-117
...
TFYEI LS19-3353-96

Hope this helps anyone else out there!