Perl Code for Sequence Extraction
1
0
Entering edit mode
9.9 years ago
csmpresent ▴ 20

Hi,

Please let me know how the coding can be done for extracting each sequence in separate files, from a single file (.txt or .fa) containing those multiple sequences in Fasta format.

Now, please let me elaborate this:

Suppose I have a file MultiFasta.txt which containes these three sequences in Fasta format:

> Species1
GTTGATGTAGCTTAAACTTAAAGCAAGGCA... ...AACAGACTTACACATGCAAGCATCCACGCCCCGGTGAG
> Species2
CGCTTAACCACACCC... ... ...CCATAA
> Species3
ATTAGATACCCC... ...TATATACCGCCATCTTCAGCAAACCC

Now I want these sequences should be extracted in separate files (eithe text format or fasta format). Please let me know what should be the code for the same. I was trying but, all in vein.

Thanks in advance

Perl Sequence-Extraction Bioperl Coding Fasta • 3.1k views
ADD COMMENT
3
Entering edit mode

There are about 100 questions dealing with something similar like this. See: similar posts. Does any of those help you?

For using BioPerl see the SeqIO documentation: http://www.bioperl.org/wiki/HOWTO:SeqIO

What specifically have you tried?

ADD REPLY
1
Entering edit mode

As a side note, in contrast to the code presented in the tutorial, your program should always start like this

#!/usr/bin/env perl

use strict;
use warnings;
use diagnostics; # mandatory for a beginner

If you get any error message from such a script, you may post it here, otherwise not ;) (because that means you already know what you are doing)

ADD REPLY
0
Entering edit mode

I am not an expert in Perl. I have Goggle'd for the codes. A number of codes are available, but I did not find where it is stumbling.

Thanks for the help.

ADD REPLY
0
Entering edit mode

One becomes good at something only by trying repeatedly and reducing failure at each step. Saying "I'm not good at XYZ" is no good if said skill is crucial to one's profession or passion.

Also, you could always use a different programming language. And when people give you suggestions that they know will help beginners, it is in your benefit to accept and try the suggestion.

ADD REPLY
2
Entering edit mode
9.9 years ago
iraun 6.2k

I know that you're looking for perl solution, but here I show you an awk one-liner possible solution.

awk '/^>/{f=substr($1,2);s=f".fasta"} {print > s}' yourfile.fasta

This awk command will produce one fasta file for each sequence stored in your file. The name of the output files will be the header of each fasta record stored in your multifasta file. I don't know if you are familiar with awk but just in case:

  • /^>/ ---> if line starts with > (header line of fasta record).
  • f=substr($1,2) ---> Remove the > of header and save the string in f variable (this variable will be the output filename)
  • s=f".fasta" ---> Output file will be the content of variable $f (header) concatenated with the extension ".fasta"
  • print > s ---> save the fasta record to $s variable (output file).
ADD COMMENT
1
Entering edit mode

Try Heng Li's bioawk. awk for biological data formats makes parsing easier :)

ADD REPLY

Login before adding your answer.

Traffic: 1768 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6