I want to start by apologizing about the vague title and my overall ignorance. I have no idea what I'm doing, and the few scripts I've gotten to work have been pieced together from scraps of code taken from this and other websites rather than genuinely written by me. I'm working in Python using some biopython code, and that's the limit of my coding comfort zone so far.
I'd like to take an MSA and an input list of positions to create a new MSA (or other output file type) with all of the same rows but only the columns for positions specified in the list. Ideally the identifiers for the sequences in the first MSA will carry over to the output. I've had problems getting this to work:
positions = [18,20,26,33,83,86,87,88,133,517]
outputMSA = MultipleSeqAlignment([])
inputMSA = AlignIO.read("input.fst", "fasta")
for y in inputMSA:
for x in positions:
outputMSA.append(inputMSA[y,x-1])
print(outputMSA)
I get "TypeError: list indices must be integers or slices, not SeqRecord"
Assuming I could get this to work, the next step would be to take a reference sequence for those positions and the output MSA, and list all of the unique sequences in the output MSA, the frequency of each one, and the frequency of the reference sequence.
Example inputs/outputs:
positions = [3, 8, 9, 11]
input MSA:
IAMAWFLATTHIS
IAMAWFLATTHIS
IAMAWFLATTHIS
IA-AWFLATTHIS
output MSA:
MATH
MATH
MATH
-ATH
referenceseq = MASH
output analysis:
MASH 0%,
MATH 75%,
-ATH 25%
UPDATE: My probably inelegant solution:
inputMSA = AlignIO.read("input.fst", "fasta")
tempMSA = inputMSA[:,0:0]
for x in positions:
outputMSA = tempMSA[:,:] + inputMSA[:, (x-1):x]
tempMSA = outputMSA
print(outputMSA)
Still working on the second part.
Please add updates as comments instead of editing the post.
UPDATE: My probably inelegant solution:
Still working on the second part. I thought this would be the easy part, but I'm stuck again.