conservative position extraction
1
I have kind of fastq files with multiple records:
>ID some information
--A-TGTGAC
0100111100
etc.
Where the 2nd line is a consensus sequence (gap or nucleotide), and 3rd is (now binary) a conservative.
How to parse this file and extract position with the "1" score?
Pure Python code is too complicated. Biopython works with only Phred score.
sequence
consensus
conservative
fastq
parsing
• 715 views
•
link
updated 4.6 years ago by
JC
13k
•
written 4.6 years ago by
gatiyatov
•
0
Perl (because Python seems complicated):
#!/usr/bin/perl
use strict;
use warnings;
my $nl = 0;
my @sq = '';
while (<>) {
$nl++;
if ($nl == 1) {
print;
}
elsif ($nl == 2) {
chomp;
@sq = split(//, $_);
elsif ($nl == 3) {
chomp;
my @cn = split(//, $_);
if ($#sq == $#cn) {
for (my $i=0; $i<=$#cn; $i++) {
print $sq[$i] if ($cn[$i] == 1);
}
print "\n";
}
else { die "line2 and line3 have different lenght\n"; }
$nl = 0;
}
}
run as:
perl getCons.pl < FASTA_IN > FASTA_OUT
•
link
4.6 years ago by
JC
13k
Login before adding your answer.