How To Write Sequences To Fasta Format With No Linebreaks (Wide Format) Using Bioperl
2
0
Entering edit mode
11.8 years ago
angel.roey • 0

I'm trying to export aligned sequences to a fasta file one by one using Bio::SeqIO. The result is that the sequences are broken by a new line every 60 columns. How do I avoid that?
I'd like to have the sequences exported in a 'wide' format, i.e. no line breaks in the sequence.

My code is roughly:

use Bio::SeqIO;
my $seqin = Bio::SeqIO->new(-file => "<$fastaFile", '-format' => 'Fasta');
my $outname = fileparse($fastaFile, qr/\.[^\.]*$/) . "_sub.fasta";
my $seqout = Bio::SeqIO->new(-file => ">$outname", '-format' => 'Fasta');

while(my $seq = $seqin->next_seq){
      # do something with $seq
      $seqout->write_seq($seq);
}
perl bioperl fasta • 7.4k views
ADD COMMENT
0
Entering edit mode

"...the sequences are broken by a new line every 60 columns. How do I avoid that?"

Why do you want to avoid that?

ADD REPLY
3
Entering edit mode
11.8 years ago
kmcarr00 ▴ 290

Yes, there is a simple parameter for controlling the width of the output, called surprisingly -width.

my $seqout = Bio::SeqIO->new(-file=>"$outFile", -format=>'Fasta', -width=>"$lineWidthValue")

The value of -width appears to be limited to 32,766 (i.e. essentially 2^15) so it isn't possible to write infinitely long lines with this method.

If $lineWidthValue is not defined, or == 0 the output will default to 60 characters per line.

ADD COMMENT
0
Entering edit mode

Commenting on my own answer (bad form, yes) but it appears that it is possible using the -width method to get infinitely long lines. If $lineWidthValue == -1 the sequence will be output as a single line. There is a caveat however; if your pass '-1' to -width then write_seq does not print a newline at the end of the sequence as it normally would. You would need to manually print a newline to your output stream.

ADD REPLY
0
Entering edit mode
11.8 years ago
fo3c ▴ 450

I don't know bioperl, but I would circumvent the problem by opening $seqout as

open($seqout,'>',$outname);
and outputting as

print $seqout ">",$seq->header,"\n",$seq->sequence,"\n";

Do replace $seq->header and $seq->sequence with the appropriate variables/methods.

(don't forget to close $seqout)

ADD COMMENT
0
Entering edit mode

Bio::SeqIO handles all file i/o; there's no need to explicitly open or close the fasta files. The notation open($seqout,'>',$outname); attempts to use a Bio::SeqIO object as a file handle. This shouldn't be done, since that object has its own methods.

ADD REPLY
0
Entering edit mode

That's not what I meant. I suggested replacing the SeqIO way to output with a regular perl way to avoid the problem described in the question.

ADD REPLY
0
Entering edit mode

Yep, my way of circumventing it was exactly that -- open a file handle and print $seq->header and $seq->sequence to that file. But I thought there must be a more elegant way.
@Kenosis what would be the solution then? Is there a method for a wide fasta format in Bio::SeqIO?

ADD REPLY
0
Entering edit mode

emphasized textIs there a method for a wide fasta format in Bio::SeqIO?

I haven't found one--yet--either googling or by examining the Bio::SeqIO and Bio::Seq modules. The OP's also posted the question at StackOverflow with no responses as of the time of this reply.

ADD REPLY
0
Entering edit mode

This is not true. Sometimes you want to pass a filehandle and not a filename, this is perfectly acceptable and there's a method for that, also :). For example, if you have the lexical filehandle $in, you'd pass the key/values of "-fh => \*$in" to the method new (instead of the familiar "-file => $filename").

ADD REPLY
0
Entering edit mode

Good correction! Appreciate it...

ADD REPLY

Login before adding your answer.

Traffic: 2542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6