Hi, I have the next code to convert from fastq to fasta, however I have some problems to be fix.
I want from:
@YOX24:00004:00021
AATCAGATAGTCAGCGAAGCGATTCGCCGCGCCATGACCATTGAATGGGCTTTTATCTTGATCGAAGCACCTGTTTGTAACAAAAAGGGTTGGCCCTTCGAGTTTGATCTCTTTCGATGAGAACAAGTCGACGTTGCTGGATCATCTGGCGTTAGCAATGTGGTGATGATCGCGTTCTTCAAGTTGTTTTGTGGTTCGGTCGGAGAA
+
929994988483448887444665///,/222*//0//,//,/8,//754444,4:492/;<;11,1<<?29449144;:44444+4;-4249;;-424//7775/-,0,,/65684444489;A@B>884444244448994444442449999=967:?A<@BBBBCA?@A11,1-+-,2,,*/8885;;C>B>=8884424333
@YOX24:00004:00026
CCTAGGCATTACTCACCCGTCCGCCGCTCGACGCCGTTATCGTCCCCCGAAGGTTCAGTTAACTCGTTTCCGCTCGACTTGCATGTGTTAGGCCTGCCGCCAGCGTTCAATCTGAGCCATGATCAAACTCTTCAATTTAAGATTTTGTTCGGCTCAATGAATACTGAACATTACATAAAGTAATGTTTGAATTGACTGTGCTGAGTCCGAAGACTCAAT
+
;;111,/18137>>AA?:?9959959>?AA@99959A?F?@AACCCC3AA=A>@=@B;;8<?@AAA@@<@>AAAA>>9;599B;<<BCA<<;=@<9<=@>>A;;<@@BFCCEEC;<;>A@>@@@BC>CBCD=?@8;:/8=?<;??2<<;??:?BB>;=@=:000>;<:A00,0;;;AC>CBB>@?;;6;>;B5::>??????A@888<;;9=@@@BB>A
@YOX24:00004:00027
TTTGAAATCGATGACATCGAGTACTCGGTTCGATTGGTTTACGAAGCACGTTACCAAAAAGAAGGCGACATGAGCCTTGTGCTGCACAGCGCTGAAGACGGCAACTTCTACACACTTCGTTTACCGTTGGTCATGTAGACACTGGGCGCTGCATTATGATAGGTGGCCTACAAGGCCCTCGAAGCAGTGAAGAAAACAACGCAAAATCAAAAACTGACTC
+
//*/8851:>DDCBBBBCBBBCDDBCC@CACCDC@B?CC99;BB>@A993319849999-9>;><@BBCFFCCDA@B8::BBBCBBB??9;;BBC@CC?>=@B?AA@CBAB@?;4429@BB:@@8<<8<:>DCBBBBBBCCBCBH?:<<B==BF133@@>>B<@>5;89DBB:=D=CACCCCBA@ACB@@<9999/9@<A@BBBB9BDCBBB2CBCCBBB
@YOX24:00004:00031
AGCAAACTTCAAGAAAATTCCTTCTTCCTCCAAGATGGGAACTCGACTTGGCTTTGTTGCGTAATTCGGTCAGAAAACCAAACTGCCTGCAATTGAGTAATTCTTATACAACACACTGCGTTTCAGGCATACCAAGCCCTGTAAGTTTGTTCAGCGCTTTAATCATGGCGTAAGTTTCACCAACCTGAGCATTGTAGTTTCTCAGACTTAATCGCCCACCTAACAACTGCTTCACT
+
88//13<B@D?B?BKK11,/,1,/777:AA@C?AB?@?:@>@@@@@BB>C<99919A699C>?<?<9969?@9999/95991<AAA?CBB@=@8;<CCC@C@DC;<<AAB@;;;;<;<<BCC>BAA?AB<998959@B:AABB=<<<:99AB1111111,1:<<@9959999?99909@@6969>>?>>>99599>99919@BCDCCC@EAB<99919@<??<9969999969@AD
@YOX24:00004:00032
TCAGCTCAGTCCTTACGTCGCCGTCCTAGCGGTGTCCTTATCATCCTGATAGCTAACATTCCCCGTTAGCGCACATCACTTGTTCCTTGAGCGTGTCCCTTGCATCTTCCTGATGCTGTCCATAACCTCAATCCTATGAGGGGTCCATTGTCTATCCTTAGTCATCAACATCCTATTGATGATTGCTTCCTTACAATGTCCTGCGCTCCATGCTTGATCAATCATCCTGATTAACCA
+
:99499C@@@A>A64/<8,,,*,,656>?BB?ABCC@A=??9::?:==?@:::CC@CCCACCB499;998?@@@CDBBCC<==ADBC@CBB;;9>>>>:>5<<CCDD@CICCD<<<ABBC8::B>A=@>><9948?>>8888.8B699:::8:;@B>@>@@@;;;@A=??@B>AAA4;;80::.00>:<9>.70;:0./=:==B?;:7+,,**0*,,./5765:::<633@7:76,/
@YOX24:00004:00033
AGCACAGCCACAAGCTTCACATACTTGGCTTTCTTCTAATTCGACACTCATAACGAATCCTCTTGTAAAACTAAGGGTATTCTAGCAGGTATTTGATCTGTATCTACA
+
999A999@899;?:99:@9944444992444-497733184737177777777=8=489<>88133788.333178,674:001//6;,.333,377>???BB::;<<
to:
>YOX24:00004:00021
AATCAGATAGTCAGCGAAGCGATTCGCCGCGCCATGACCATTGAATGGGCTTTTATCTTGATCGAAGCACCTGTTTGTAACAAAAAGGGTTGGCCCTTCGAGTTTGATCTCTTTCGATGAGAACAAGTCGACGTTGCTGGATCATCTGGCGTTAGCAATGTGGTGATGATCGCGTTCTTCAAGTTGTTTTGTGGTTCGGTCGGAGAA
>YOX24:00004:00026
CCTAGGCATTACTCACCCGTCCGCCGCTCGACGCCGTTATCGTCCCCCGAAGGTTCAGTTAACTCGTTTCCGCTCGACTTGCATGTGTTAGGCCTGCCGCCAGCGTTCAATCTGAGCCATGATCAAACTCTTCAATTTAAGATTTTGTTCGGCTCAATGAATACTGAACATTACATAAAGTAATGTTTGAATTGACTGTGCTGAGTCCGAAGACTCAAT
>YOX24:00004:00027
TTTGAAATCGATGACATCGAGTACTCGGTTCGATTGGTTTACGAAGCACGTTACCAAAAAGAAGGCGACATGAGCCTTGTGCTGCACAGCGCTGAAGACGGCAACTTCTACACACTTCGTTTACCGTTGGTCATGTAGACACTGGGCGCTGCATTATGATAGGTGGCCTACAAGGCCCTCGAAGCAGTGAAGAAAACAACGCAAAATCAAAAACTGACTC
>YOX24:00004:00031
AGCAAACTTCAAGAAAATTCCTTCTTCCTCCAAGATGGGAACTCGACTTGGCTTTGTTGCGTAATTCGGTCAGAAAACCAAACTGCCTGCAATTGAGTAATTCTTATACAACACACTGCGTTTCAGGCATACCAAGCCCTGTAAGTTTGTTCAGCGCTTTAATCATGGCGTAAGTTTCACCAACCTGAGCATTGTAGTTTCTCAGACTTAATCGCCCACCTAACAACTGCTTCACT
>YOX24:00004:00032
TCAGCTCAGTCCTTACGTCGCCGTCCTAGCGGTGTCCTTATCATCCTGATAGCTAACATTCCCCGTTAGCGCACATCACTTGTTCCTTGAGCGTGTCCCTTGCATCTTCCTGATGCTGTCCATAACCTCAATCCTATGAGGGGTCCATTGTCTATCCTTAGTCATCAACATCCTATTGATGATTGCTTCCTTACAATGTCCTGCGCTCCATGCTTGATCAATCATCCTGATTAACCA
>YOX24:00004:00033
AGCACAGCCACAAGCTTCACATACTTGGCTTTCTTCTAATTCGACACTCATAACGAATCCTCTTGTAAAACTAAGGGTATTCTAGCAGGTATTTGATCTGTATCTACA
however some of those still appear with some symbols of the quality (>CB>CBC@@@@@@9999/9@A6<9@248988-?
).
#!/usr/bin/perl -w
use strict;
use Getopt::Long;
use Term::ANSIColor;
my ($imput, $output, $line, $usage);
GetOptions (
'i=s' => \$imput,
'o=s' => \$output,
);
$usage = (qq(
Error:
Wrong Arguments
Usage:
fastxQA -i infile.fastq -o utfile.fasta
));
if (!$imput or !$output) {
print color("red"), "$usage", color("reset"),"\n\n";
exit;
}
open FASTQIN, '<', "$imput" or die (color("red"), "\nCan't open $imput file", color("reset"),"\n\n");
open FASTAOUT, '>', "$output" or die (color("red"), "Can't genenate $output file", color("reset"),"\n\n");
while ($line= <FASTQIN>) {
chomp $line;
if ($line=~ s/^@/>/g) {
my $id= $line;
print FASTAOUT "$id\n";
}
elsif ($line=~ s/^[+]//g){
next;
}
elsif ($line=~ s/[^a+|^c+|^g+|^t+|^n+]//gi){ # Here is the problem I tried this to : [\d*|\*|@*|?*|;*|<*|>*|,*]
next; # how to modified this to fix it ???
}
else {
my $fastaseq = $line;
chomp $fastaseq;
print FASTAOUT "$fastaseq\n";
}
}
close FASTQIN;
close FASTAOUT;
exit;
I think that the problem is when the quality start with @
symbol like: @A<@BB8<>AA?A>
;
I will Thanks So much If You Can Help Me (Sorry, I just Start to learn by myself perl )!!!
You could try seqtk. Fastq to fasta conversion is the first example:
Following the
seqtk
advice, which I would definitely recommend, if you want to play a little bit more with the fastq lines and if you still want to do it in perl you could always use the universal implementation ofseqtk
named readfq. It simply requires embedding a little 40 lines subroutine in your code and you'll be able to handle fastq files fast and easy.Thanks So much all of you, it was really helpful!!!
It works well with the modifications suggested by thackl.
THANKS SO MUCH TO ALL!!!