Question

Perl Code for Sequence Extraction

0

Entering edit mode

9.9 years ago

csmpresent ▴ 20

Hi,

Please let me know how the coding can be done for extracting each sequence in separate files, from a single file (.txt or .fa) containing those multiple sequences in Fasta format.

Now, please let me elaborate this:

Suppose I have a file MultiFasta.txt which containes these three sequences in Fasta format:

> Species1
GTTGATGTAGCTTAAACTTAAAGCAAGGCA... ...AACAGACTTACACATGCAAGCATCCACGCCCCGGTGAG
> Species2
CGCTTAACCACACCC... ... ...CCATAA
> Species3
ATTAGATACCCC... ...TATATACCGCCATCTTCAGCAAACCC

Now I want these sequences should be extracted in separate files (eithe text format or fasta format). Please let me know what should be the code for the same. I was trying but, all in vein.

Thanks in advance

Perl Sequence-Extraction Bioperl Coding Fasta • 3.1k views

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by csmpresent ▴ 20

3

Entering edit mode

There are about 100 questions dealing with something similar like this. See: similar posts. Does any of those help you?

For using BioPerl see the SeqIO documentation: http://www.bioperl.org/wiki/HOWTO:SeqIO

What specifically have you tried?

ADD REPLY • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by Michael 55k

1

Entering edit mode

As a side note, in contrast to the code presented in the tutorial, your program should always start like this

#!/usr/bin/env perl

use strict;
use warnings;
use diagnostics; # mandatory for a beginner

If you get any error message from such a script, you may post it here, otherwise not ;) (because that means you already know what you are doing)

ADD REPLY • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by Michael 55k

0

Entering edit mode

I am not an expert in Perl. I have Goggle'd for the codes. A number of codes are available, but I did not find where it is stumbling.

Thanks for the help.

ADD REPLY • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by csmpresent ▴ 20

0

Entering edit mode

One becomes good at something only by trying repeatedly and reducing failure at each step. Saying "I'm not good at XYZ" is no good if said skill is crucial to one's profession or passion.

Also, you could always use a different programming language. And when people give you suggestions that they know will help beginners, it is in your benefit to accept and try the suggestion.

ADD REPLY • link 2.7 years ago by Ram 44k

Ram · Answer 1 · 2014-12-29

I know that you're looking for perl solution, but here I show you an awk one-liner possible solution.

awk '/^>/{f=substr($1,2);s=f".fasta"} {print > s}' yourfile.fasta

This awk command will produce one fasta file for each sequence stored in your file. The name of the output files will be the header of each fasta record stored in your multifasta file. I don't know if you are familiar with awk but just in case:

/^>/ ---> if line starts with > (header line of fasta record).
f=substr($1,2) ---> Remove the > of header and save the string in f variable (this variable will be the output filename)
s=f".fasta" ---> Output file will be the content of variable $f (header) concatenated with the extension ".fasta"
print > s ---> save the fasta record to $s variable (output file).