Entering edit mode
7.3 years ago
bio90029
▴
10
Hi, I am running out of ideas to do this, and I will appreciate some help, please.I have 2 fasta files from two different bacterial strains with 1000 genes each. An example of files:
file A: fileB
query seq.id query seq.id
query seq query seq.id
objA seq.id objB seq.id
objA seq objB seq
query_1 seq.id query_1 seq.id
query_1 seq query_1 seq
obj_1A seq.id obj_1B seq.id
obj_1A seq obj_1B seq
What I would like to do is to get it this:
file_1 file_2
query seq.id query_1 seq.id
query seq query_1 seq
obj seq.id obj_1A seq. id
obj seq obj_A seq
objB seq.id obj_1B seq.id
objB seq obj_1B seq
But I just dont know how to split the fasta files. I was trying to do this using biopython SeqIO but I am quite lost.
If I understand correctly, you have 2 large .fa files, of which individual entries you would like to split to separate folders?
In fact, I have 100 fasta files that contained about 1000 genes, but if I manage to do it for 2, and will manage for all. Each file contained, the query reference genes, and the object gene they match. What I would like to do is to split the fasta files or to short them out in the way that I have one file per query gene with all the matching object genes.
Please show the Biopython code you're trying, with errors, and we can help correct any errors. This would be the most beneficial for you, as a learning experience, and also keep s/o from writing the code for you as Biostars is not a coding service.
Post what you have tried. Also, your use of terms here is a little confusing. Are there '>' in our fasta file headers, and just not represented here? Perhaps post a cpl of examples from you files, if you can share them.
Yes, the genes id all containg the '>' .
But I don't get the right ouput. In fact it places all the genes in the new fasta file when I only want 2 genes per file.