I am trying to run a model in PAML that requires me to combine records from multiple fasta files. I have 6 FASTA files, each with the same number of records. What I want to do is interleave the records into a single file such that my result file has:
>record1_fileOne
AAA
>record1_fileTwo
AAA
>record1_fileThree
AAA
>record1_fileFour
AAA
>record1_fileFive
AAA
>record1_fileSix
AAA
>record2>fileOne
GGG
I wrote the code below which just concatenates the fasta records, not interleaving them. I think there is probably some sort of trick I can use using python itertools that I'm just not seeing. Can anyone point me in the right direction?
I found this script that interleaves 2 fasta files, but I need to extend it to N fasta files:
def read_fasta(fh):
""" generator for reading a fasta record: taken from [http://stackoverflow.com/a/7655072/1735942][2] """
name, seq = None, []
for line in fh:
line = line.rstrip()
if line.startswith(">"):
if name: yield (name, ''.join(seq))
name, seq = line, []
else:
seq.append(line)
if name: yield (name, ''.join(seq))
fastafiles=args[0:]
filehandles=list(itertools.imap(open, fastafiles)) #list of filehandles for fasta files
for fh in filehandles:
for id,seq in read_fasta(fh):
print id
print seq
There's no need to write any code to group fasta records, as that's one well-implemented function of Biopython.
well, if you need Biopython anyway then go for it, but if you just want to parse these files without introducing an additional dependency, then that's a great solution!
finally, somebody who understands Python concepts and is not just offering 'same-in-every-language' solutions. +1 on this