Dear colleagues, I am applying for your help once more. Here is my perl script for counting the length between the binding sites and the start points of exons. So, there are some genes saved in the file called sequence.txt
and some amount of binding sites in motif.txt
. The thing I want to do is to count the length of the fragment for every gene if it has any of the binding sites from motif.txt
. This current script does not work in the way I would like it. How should it be changed? Shall I add another while
loop for $motif
?
$string_filename = 'sequence.txt';
open(FILE, $string_filename) || die("Couldn't read file $string_filename\n");
$motif_filename = 'motif.txt';
open(MOTIF, $motif_filename) || die("Couldn't read file $motif_filename\n");
local $/ = "\n>";
while (my $seq = <FILE>) {
chomp $seq;
$seq =~ s/^>*.+\n//;
$seq =~ s/\n//g;
$R = length $seq;
$motif = <MOTIF>;
chomp $motif;
$motif =~ s/^>*.+\n//;
$motif =~ s/\n//g;
if ( $seq =~ /$motif/ ) { ## insert actual binding site
$M = $';
$W = length $M;
if ( $seq =~ /[A-Z]/) { ## exon start
$K = $`;
$Z = length $K;
$x = $W + $Z - $R;
print "\n\ the distance is the following: $x\n\n";
} else {
print "\n\ I couldn't find the start codon.\n\n";
}
} else {
print "\n\ I couldn't find the binding site.\n\n";
}
}
close MOTIF;
close FILE;
exit;
please, show us what your inputs look like.
There is a FASTQ-FASTA converter here - and Bioperl's
Bio::SeqIO::fastq
will handle some FASTQ formats.the binding sites look like this
and the genes
binding site=A FASTQ file ??!!!!!
I recognise that it is a problem but i haven't found the way to convert it to fasts yet.
And further discussion of FASTQ-FASTA conversion at StackOverflow.