Need Coding To Modify Multifasta File
4
1
Entering edit mode
10.7 years ago
HG ★ 1.2k

Hi all i ahve a multifats file contiing 3 differnent sequence(its around 3000 sequences) i want to separate them accoding to name. I need three diffeent file which contaion 3 diffent sequence. Can any one help me out

    >pseudogenome_H131_84   Probable 9-O-acetyl-N-acetylneuraminic acid deacetylase
    ATGAACGCAATAATATCGCCCAATTATTACTATGTTCTTACCGTTGCTGGTCAGTCTAAT
    >pseudogenome_H134_4332   Probable 9-O-acetyl-N-acetylneuraminic acid deacetylase
    ATGAACGCAATAATATCGCCCGATTATTACTATGTTCTTACCGTTGCTGGTCAGTCTAAT
    GCCATGGCGTATGGCGAAGGACTGCCATTACCGGACAGGGAAGATGCGCCTCATCCCAGA
    >pseudogenome_H24a_4333   Probable 9-O-acetyl-N-acetylneuraminic acid deacetylase
    ATGAACGCAATAATATCGCCTGATTATTACTATATTCTTACCGTTGCTGGTCAGTCTAAT
    GCCATGGCGTATGGCGAAGGACTGCCATTACCGGACAGGGAAGATGCGCCTCATCCCAGA
perl awk • 2.9k views
ADD COMMENT
3
Entering edit mode

What have you tried? While many of us could simply write a script for you, you won't learn that way.

ADD REPLY
0
Entering edit mode

I can break file like this : awk '/^>/{file=++n".fasta"} {print > file}' three.fas now i need to sepatre accoding to name.

ADD REPLY
2
Entering edit mode

It would be great if you could correct mistakes in your post.

ADD REPLY
4
Entering edit mode
10.7 years ago
PoGibas 5.1k

Always search on the internet before asking...

Alternative methods to split a FASTA file by Paulo Nuin

Perl example works for me.

ADD COMMENT
2
Entering edit mode
10.7 years ago
User000 ▴ 710

Not sure I got your Q, but in python:

from Bio import SeqIO
fh = open('file.fasta','r')
oh = open('result.fasta','a')
for key in SeqIO.parse(fh, 'fasta'):
name = key.name
if key.name in ['blablabla']:
    oh.write(str('>' + key.id)) + '\n')
    oh.write(str(key.seq[0:]) + '\n') 
oh.close()
fh.close()
ADD COMMENT
2
Entering edit mode
10.7 years ago
paulr ▴ 80

Okay, here's a clean, straightforward way in python. The filenames will be enumeration on the number of sequences in your multi-fasta file.

This also requires that you have Biopython installed.

from Bio import SeqIO
fasta = SeqIO.parse("/tmp/fasta.fa","fasta") # Specify the format, should return an iterator to the multi-fasta entries
output_base = "/tmp/extracted_" # Base of your output: eg, /tmp/extracted_0, /tmp/extracted_1, ..., /tmp/extracted_N
for seqid, seq in enumerate(fasta):
    output_file = file(output_base+str(seqid)+".fa", "w") # You can of course decide on the appropriate file extension
    SeqIO.write(seq, output_file, "fasta") # See the docs at http://biopython.org/wiki/SeqIO#Sequence_Output
    output_file.close() # The loop should close and flush the sequence to the file automatically, but this call is just to be verbose

Best of luck! Paul

ADD COMMENT
1
Entering edit mode
10.7 years ago
Prakki Rama ★ 2.7k

Found a script similar to your case here while Googling. Might be useful to you!

~Prakki Rama

ADD COMMENT

Login before adding your answer.

Traffic: 1793 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6