Forum:Fastq To Fasta For Human Whole Genome
3
4
Entering edit mode
11.8 years ago
Vova Naumov ▴ 220

Samtools can generate genome sequencing concensus FASTQ from bam file. Sometimes there is need to convert this file to fasta for example to get genotypes in bed file positions using bedtools FastaFromBED command. There is a lot of one liners converting FASTQ to fasta, but they can't work with such big files.

So here is a php code from my collegue Nickolay Kulemin doing this conversion just great:

/* This script can convert FASTQ to FASTA for human genomes. It looks for lines, started with "@chr" and with "+" and prints them into FASTA format. Usage: php fastq_fasta.php input_file output_file. */
<?php
if ($argc != 3) die ("File_please (input output)!\n");
$fp = fopen("{$argv["1"]}", "r");
$fw = fopen("{$argv["2"]}", "w");
$type = 0;
$first = 0;
while(!feof($fp)) {
    $type_old = $type;
    $line = fgets($fp, 4000);
    if (feof($fp)) break;
    if (substr($line,0,4) == "@chr") {
        $type = 1;
        $lline = explode("\n", $line);
        $gline = substr($lline["0"],1);
        if ($first == 0) {
            $ggline = ">$gline\n";
            $first = 1;
        } else $ggline = "\n>$gline\n";
        fwrite ($fw, $ggline);
        echo("$ggline");
    }
    if (substr($line,0,1) == '+') $type = 2;
    if ((substr($line,0,1) == ' ')||(substr($line,0,1) == "\n")) $type = 0;
    if (($type != $type_old)||($type != 1)) continue;
    $line = trim($line);
    $line = strtoupper($line);
    fwrite($fw, $line);
}
fclose($fp);
fclose($fw);
?>
fastq fasta genome • 4.4k views
ADD COMMENT
7
Entering edit mode
11.8 years ago

You can also do something like that (the size of the fastq-file does not matter):

cat file.fastq | paste - - - - | perl -F'\t' -ane 'print ">$F[0]\n$F[1]\n";' > file.fasta
ADD COMMENT
5
Entering edit mode
11.8 years ago
matted 7.8k

I don't know why you'd roll your own (and in PHP!) when there are good, validated options available. Heng Li's seqtk will do this:

seqtk seq -a in.fq.gz > out.fa
ADD COMMENT
0
Entering edit mode

just didn't found it by googling )

ADD REPLY
0
Entering edit mode

and why not PHP ?

ADD REPLY
2
Entering edit mode
11.8 years ago

You can also alter the samtools perl script so that it generates fastas, and not fastqs. Change the p2q_post_process subroutine as below:

sub p2q_post_process {
  my ($chr, $seq, $qual, $gaps, $l) = @_;
  &p2q_filter_gaps($seq, $gaps, $l);
  print "\>$chr\n"; &p2q_print_str($seq);
#  print "+\n"; &p2q_print_str($qual)

Basically, you are changing the "@" to a ">", and you are commenting out the line that would normally print the quality string.

ADD COMMENT
0
Entering edit mode

that's cool - thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1815 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6