Hello hive brain,
We are working on an analysis at work were we are evaluating key amino acids at specific sites in a protein and I'm trying to figure out a way to speed up the analysis as this is likely something we will be doing for a while. Right now we take the processed fasta files and remove the areas of interest manual from an alignment. This is tiresome but not impossible, we currently are only working with a couple hundred fasta files so the analysis going well.
Is it possible to take the output of an alignment (clustal format) and extract the positions of interest for each sequence in the alignment? I would like to go this approach because not every pcr product is equal and I think that this would be the more robust method. The trouble with this is I have no idea how to manipulate a clustal output to give me only columns x,y,z.
The other possible answer I can think of and can code is for each pcr product give me positions x,y,z which is straight forward but requires a lot of assumptions (i.e. are they all the same length, do they start in the right spot, etc..)
If anyone can point me in a direction I would be grateful.
Thank you, Sean