fastq to fasta perl
4
1
Entering edit mode
9.4 years ago
cabraham03 ▴ 30

Hi, I have the next code to convert from fastq to fasta, however I have some problems to be fix.

I want from:

@YOX24:00004:00021
AATCAGATAGTCAGCGAAGCGATTCGCCGCGCCATGACCATTGAATGGGCTTTTATCTTGATCGAAGCACCTGTTTGTAACAAAAAGGGTTGGCCCTTCGAGTTTGATCTCTTTCGATGAGAACAAGTCGACGTTGCTGGATCATCTGGCGTTAGCAATGTGGTGATGATCGCGTTCTTCAAGTTGTTTTGTGGTTCGGTCGGAGAA
+
929994988483448887444665///,/222*//0//,//,/8,//754444,4:492/;<;11,1<<?29449144;:44444+4;-4249;;-424//7775/-,0,,/65684444489;A@B>884444244448994444442449999=967:?A<@BBBBCA?@A11,1-+-,2,,*/8885;;C>B>=8884424333
@YOX24:00004:00026
CCTAGGCATTACTCACCCGTCCGCCGCTCGACGCCGTTATCGTCCCCCGAAGGTTCAGTTAACTCGTTTCCGCTCGACTTGCATGTGTTAGGCCTGCCGCCAGCGTTCAATCTGAGCCATGATCAAACTCTTCAATTTAAGATTTTGTTCGGCTCAATGAATACTGAACATTACATAAAGTAATGTTTGAATTGACTGTGCTGAGTCCGAAGACTCAAT
+
;;111,/18137>>AA?:?9959959>?AA@99959A?F?@AACCCC3AA=A>@=@B;;8<?@AAA@@<@>AAAA>>9;599B;<<BCA<<;=@<9<=@>>A;;<@@BFCCEEC;<;>A@>@@@BC>CBCD=?@8;:/8=?<;??2<<;??:?BB>;=@=:000>;<:A00,0;;;AC>CBB>@?;;6;>;B5::>??????A@888<;;9=@@@BB>A
@YOX24:00004:00027
TTTGAAATCGATGACATCGAGTACTCGGTTCGATTGGTTTACGAAGCACGTTACCAAAAAGAAGGCGACATGAGCCTTGTGCTGCACAGCGCTGAAGACGGCAACTTCTACACACTTCGTTTACCGTTGGTCATGTAGACACTGGGCGCTGCATTATGATAGGTGGCCTACAAGGCCCTCGAAGCAGTGAAGAAAACAACGCAAAATCAAAAACTGACTC
+
//*/8851:>DDCBBBBCBBBCDDBCC@CACCDC@B?CC99;BB>@A993319849999-9>;><@BBCFFCCDA@B8::BBBCBBB??9;;BBC@CC?>=@B?AA@CBAB@?;4429@BB:@@8<<8<:>DCBBBBBBCCBCBH?:<<B==BF133@@>>B<@>5;89DBB:=D=CACCCCBA@ACB@@<9999/9@<A@BBBB9BDCBBB2CBCCBBB
@YOX24:00004:00031
AGCAAACTTCAAGAAAATTCCTTCTTCCTCCAAGATGGGAACTCGACTTGGCTTTGTTGCGTAATTCGGTCAGAAAACCAAACTGCCTGCAATTGAGTAATTCTTATACAACACACTGCGTTTCAGGCATACCAAGCCCTGTAAGTTTGTTCAGCGCTTTAATCATGGCGTAAGTTTCACCAACCTGAGCATTGTAGTTTCTCAGACTTAATCGCCCACCTAACAACTGCTTCACT
+
88//13<B@D?B?BKK11,/,1,/777:AA@C?AB?@?:@>@@@@@BB>C<99919A699C>?<?<9969?@9999/95991<AAA?CBB@=@8;<CCC@C@DC;<<AAB@;;;;<;<<BCC>BAA?AB<998959@B:AABB=<<<:99AB1111111,1:<<@9959999?99909@@6969>>?>>>99599>99919@BCDCCC@EAB<99919@<??<9969999969@AD
@YOX24:00004:00032
TCAGCTCAGTCCTTACGTCGCCGTCCTAGCGGTGTCCTTATCATCCTGATAGCTAACATTCCCCGTTAGCGCACATCACTTGTTCCTTGAGCGTGTCCCTTGCATCTTCCTGATGCTGTCCATAACCTCAATCCTATGAGGGGTCCATTGTCTATCCTTAGTCATCAACATCCTATTGATGATTGCTTCCTTACAATGTCCTGCGCTCCATGCTTGATCAATCATCCTGATTAACCA
+
:99499C@@@A>A64/<8,,,*,,656>?BB?ABCC@A=??9::?:==?@:::CC@CCCACCB499;998?@@@CDBBCC<==ADBC@CBB;;9>>>>:>5<<CCDD@CICCD<<<ABBC8::B>A=@>><9948?>>8888.8B699:::8:;@B>@>@@@;;;@A=??@B>AAA4;;80::.00>:<9>.70;:0./=:==B?;:7+,,**0*,,./5765:::<633@7:76,/
@YOX24:00004:00033
AGCACAGCCACAAGCTTCACATACTTGGCTTTCTTCTAATTCGACACTCATAACGAATCCTCTTGTAAAACTAAGGGTATTCTAGCAGGTATTTGATCTGTATCTACA
+
999A999@899;?:99:@9944444992444-497733184737177777777=8=489<>88133788.333178,674:001//6;,.333,377>???BB::;<<

to:

>YOX24:00004:00021
AATCAGATAGTCAGCGAAGCGATTCGCCGCGCCATGACCATTGAATGGGCTTTTATCTTGATCGAAGCACCTGTTTGTAACAAAAAGGGTTGGCCCTTCGAGTTTGATCTCTTTCGATGAGAACAAGTCGACGTTGCTGGATCATCTGGCGTTAGCAATGTGGTGATGATCGCGTTCTTCAAGTTGTTTTGTGGTTCGGTCGGAGAA
>YOX24:00004:00026
CCTAGGCATTACTCACCCGTCCGCCGCTCGACGCCGTTATCGTCCCCCGAAGGTTCAGTTAACTCGTTTCCGCTCGACTTGCATGTGTTAGGCCTGCCGCCAGCGTTCAATCTGAGCCATGATCAAACTCTTCAATTTAAGATTTTGTTCGGCTCAATGAATACTGAACATTACATAAAGTAATGTTTGAATTGACTGTGCTGAGTCCGAAGACTCAAT
>YOX24:00004:00027
TTTGAAATCGATGACATCGAGTACTCGGTTCGATTGGTTTACGAAGCACGTTACCAAAAAGAAGGCGACATGAGCCTTGTGCTGCACAGCGCTGAAGACGGCAACTTCTACACACTTCGTTTACCGTTGGTCATGTAGACACTGGGCGCTGCATTATGATAGGTGGCCTACAAGGCCCTCGAAGCAGTGAAGAAAACAACGCAAAATCAAAAACTGACTC
>YOX24:00004:00031
AGCAAACTTCAAGAAAATTCCTTCTTCCTCCAAGATGGGAACTCGACTTGGCTTTGTTGCGTAATTCGGTCAGAAAACCAAACTGCCTGCAATTGAGTAATTCTTATACAACACACTGCGTTTCAGGCATACCAAGCCCTGTAAGTTTGTTCAGCGCTTTAATCATGGCGTAAGTTTCACCAACCTGAGCATTGTAGTTTCTCAGACTTAATCGCCCACCTAACAACTGCTTCACT
>YOX24:00004:00032
TCAGCTCAGTCCTTACGTCGCCGTCCTAGCGGTGTCCTTATCATCCTGATAGCTAACATTCCCCGTTAGCGCACATCACTTGTTCCTTGAGCGTGTCCCTTGCATCTTCCTGATGCTGTCCATAACCTCAATCCTATGAGGGGTCCATTGTCTATCCTTAGTCATCAACATCCTATTGATGATTGCTTCCTTACAATGTCCTGCGCTCCATGCTTGATCAATCATCCTGATTAACCA
>YOX24:00004:00033
AGCACAGCCACAAGCTTCACATACTTGGCTTTCTTCTAATTCGACACTCATAACGAATCCTCTTGTAAAACTAAGGGTATTCTAGCAGGTATTTGATCTGTATCTACA

however some of those still appear with some symbols of the quality (>CB>CBC@@@@@@9999/9@A6<9@248988-?).

#!/usr/bin/perl -w
use strict;
use Getopt::Long;
use Term::ANSIColor;

my ($imput, $output, $line, $usage);

GetOptions (
            'i=s' => \$imput,
            'o=s' => \$output,
            );

$usage = (qq(
          Error:
             Wrong Arguments

             Usage:
             fastxQA -i infile.fastq -o utfile.fasta

));

if (!$imput or !$output) {
    print color("red"), "$usage", color("reset"),"\n\n";
    exit;
}

open FASTQIN, '<', "$imput" or die (color("red"), "\nCan't open $imput file", color("reset"),"\n\n");
open FASTAOUT, '>', "$output" or die (color("red"), "Can't genenate $output file", color("reset"),"\n\n");

while ($line= <FASTQIN>) {
    chomp $line;
    if ($line=~ s/^@/>/g) {
        my $id= $line;
        print FASTAOUT "$id\n";

    }
    elsif ($line=~ s/^[+]//g){
       next;
    }
    elsif ($line=~ s/[^a+|^c+|^g+|^t+|^n+]//gi){    # Here is the problem I tried this to :  [\d*|\*|@*|?*|;*|<*|>*|,*]
        next;                                                          # how to modified this to fix it ???
    }
    else {
         my $fastaseq = $line;
         chomp $fastaseq;
         print FASTAOUT "$fastaseq\n";
    }

}
close FASTQIN;
close FASTAOUT;
exit;

I think that the problem is when the quality start with @ symbol like: @A<@BB8<>AA?A>;

I will Thanks So much If You Can Help Me (Sorry, I just Start to learn by myself perl )!!!

perl fasta fastq • 4.6k views
ADD COMMENT
3
Entering edit mode

You could try seqtk. Fastq to fasta conversion is the first example:

seqtk seq -a in.fq > out.fa
ADD REPLY
0
Entering edit mode

Following the seqtk advice, which I would definitely recommend, if you want to play a little bit more with the fastq lines and if you still want to do it in perl you could always use the universal implementation of seqtk named readfq. It simply requires embedding a little 40 lines subroutine in your code and you'll be able to handle fastq files fast and easy.

ADD REPLY
0
Entering edit mode

Thanks So much all of you, it was really helpful!!!

It works well with the modifications suggested by thackl.

THANKS SO MUCH TO ALL!!!

ADD REPLY
4
Entering edit mode
9.4 years ago
thackl ★ 3.0k

An easy way to read fastq ist to read 4 lines at a time. It is faster and you don't have to worry about regexps.

while(
defined(my $shead = <FASTQIN>) &&
defined(my $sseq = <FASTQIN>) &&
defined(my $qhead = <FASTQIN>) &&
defined(my $qseq = <FASTQIN>)
){
  substr($shead, 0, 1, '>');
  print $shead, $sseq;
}

And if you want to have your FASTA sequence with a fixed line width:

my $line_width = 80
print $shead;
chomp($sseq);
print $_,"\n" for unpack "(A$line_width)*", $sseq;
ADD COMMENT
4
Entering edit mode
9.4 years ago

Or just do it in the command line:

zcat file.fq.gz | paste - - - - | perl -F'\t' -ane '$F[0]=~s/^@/>/;print "$F[0]\n$F[1]\n";' | gzip -c > file.fa.gz
ADD COMMENT
1
Entering edit mode
9.4 years ago
venu 7.1k

And if you need this only in perl check this

ADD COMMENT
0
Entering edit mode

I've always found readfq to be the fastest and simplest perl implementation for handling fastq files

ADD REPLY
0
Entering edit mode
4.5 years ago

See a python script which I wrote sometime back

ADD COMMENT

Login before adding your answer.

Traffic: 1928 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6