Hi I am having a fasta format sequence but the header line is not like NCBI description(please see below).
>contig07303 length=480 numreads=24 gene=isogroup00008 status=isotig
CACTATCCTcTTCATTTAGGTTTTTGATAtCTTGAATTACTTTCTtCATTTTTCTAGTAG...................
I used a program to extract sequences with length >500 for "normal" fasta sequences (from NCBI) using Bioperl Bio::SeqIO & it worked. The same program when used modifying “keyword" to retrieve sequences with numreads >500, it is not working. Is it possible to do with Bio::SeqIO module with the sequences having the above FASTA descriptions or Do I have to create different code without Bio::SeqIO module.
Thank you very much for replying raghul
Previous code given by neilfws
#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;
my $seqin = Bio::SeqIO->new(-file => "myfile.fa", -format => "fasta");
my $seqout = Bio::SeqIO->new(-file => ">myfile_100.fa", -format => "fasta");
while(my $seq = $seqin->next_seq) {
if($seq->length <= 100) {
$seqout->write_seq($seq);
}
}
Assuming that the lack of ">" in your example fasta file is just a typo can you clarify this somewhat? Your pasted code example doesn't match what you are trying to do when you say modifying "keyword" to retrieve sequences with numthreads > 500. What exactly are you trying to do and what is the code? Question just isn't very clear.
Yes, I misread this question too; it's not clear how the header looks because the sequence may not be formatted properly as shown here (that is, does it start with ">") ?
Yes, the sequence starts with > The keyword here I mean the description in FASTA header line (sorry for the confusion).The above program detects sequences less than (<)100 base in length. When I changed the symbol to > to identify sequences with greater than 100 nt in length, I got wrong answer. The output also included sequences with length less than 100 in length. Thank you for replying raghul