Question

Need Coding To Modify Multifasta File

1

Entering edit mode

11.4 years ago

HG ★ 1.2k

Hi all i ahve a multifats file contiing 3 differnent sequence(its around 3000 sequences) i want to separate them accoding to name. I need three diffeent file which contaion 3 diffent sequence. Can any one help me out

    >pseudogenome_H131_84   Probable 9-O-acetyl-N-acetylneuraminic acid deacetylase
    ATGAACGCAATAATATCGCCCAATTATTACTATGTTCTTACCGTTGCTGGTCAGTCTAAT
    >pseudogenome_H134_4332   Probable 9-O-acetyl-N-acetylneuraminic acid deacetylase
    ATGAACGCAATAATATCGCCCGATTATTACTATGTTCTTACCGTTGCTGGTCAGTCTAAT
    GCCATGGCGTATGGCGAAGGACTGCCATTACCGGACAGGGAAGATGCGCCTCATCCCAGA
    >pseudogenome_H24a_4333   Probable 9-O-acetyl-N-acetylneuraminic acid deacetylase
    ATGAACGCAATAATATCGCCTGATTATTACTATATTCTTACCGTTGCTGGTCAGTCTAAT
    GCCATGGCGTATGGCGAAGGACTGCCATTACCGGACAGGGAAGATGCGCCTCATCCCAGA

perl awk • 3.3k views

ADD COMMENT • link updated 11.4 years ago by paulr ▴ 80 • written 11.4 years ago by HG ★ 1.2k

3

Entering edit mode

What have you tried? While many of us could simply write a script for you, you won't learn that way.

ADD REPLY • link 11.4 years ago by Devon Ryan 105k

0

Entering edit mode

I can break file like this : awk '/^>/{file=++n".fasta"} {print > file}' three.fas now i need to sepatre accoding to name.

ADD REPLY • link updated 11.4 years ago by Devon Ryan 105k • written 11.4 years ago by HG ★ 1.2k

2

Entering edit mode

It would be great if you could correct mistakes in your post.

ADD REPLY • link 11.4 years ago by PoGibas 5.1k

score 4 · Answer 1 · 2014-03-31

4

Entering edit mode

11.4 years ago

PoGibas 5.1k

Always search on the internet before asking...

Alternative methods to split a FASTA file by Paulo Nuin

Perl example works for me.

ADD COMMENT • link 11.4 years ago by PoGibas 5.1k

score 2 · Answer 2 · 2014-03-31

Not sure I got your Q, but in python:

from Bio import SeqIO
fh = open('file.fasta','r')
oh = open('result.fasta','a')
for key in SeqIO.parse(fh, 'fasta'):
name = key.name
if key.name in ['blablabla']:
    oh.write(str('>' + key.id)) + '\n')
    oh.write(str(key.seq[0:]) + '\n') 
oh.close()
fh.close()

score 2 · Answer 3 · 2014-04-01

Okay, here's a clean, straightforward way in python. The filenames will be enumeration on the number of sequences in your multi-fasta file.

This also requires that you have Biopython installed.

from Bio import SeqIO
fasta = SeqIO.parse("/tmp/fasta.fa","fasta") # Specify the format, should return an iterator to the multi-fasta entries
output_base = "/tmp/extracted_" # Base of your output: eg, /tmp/extracted_0, /tmp/extracted_1, ..., /tmp/extracted_N
for seqid, seq in enumerate(fasta):
    output_file = file(output_base+str(seqid)+".fa", "w") # You can of course decide on the appropriate file extension
    SeqIO.write(seq, output_file, "fasta") # See the docs at http://biopython.org/wiki/SeqIO#Sequence_Output
    output_file.close() # The loop should close and flush the sequence to the file automatically, but this call is just to be verbose

Best of luck! Paul

score 1 · Answer 4 · 2014-03-31

1

Entering edit mode

11.4 years ago

Prakki Rama ★ 2.7k

Found a script similar to your case here while Googling. Might be useful to you!

~Prakki Rama

ADD COMMENT • link 11.4 years ago by Prakki Rama ★ 2.7k