I am using p3_in.pl obtained from "https://webblast.ipk-gatersleben.de/misa/index.php?action=3&help=3" to generate input file for the primer3 software. My genome file has chromosome level assembly, the script "p3_in.pl" takes whole chromosome for each predicted SSR in a specific chromosome. For example if chromosome-1 has 800 SSR this script put the chromosome-1 800 times in the generated output file which makes file so large to be processed by primer3 software.
The script is given below:
'''
!/usr/bin/perl -w
Author: Thomas Thiel, Sebastian Beier
Program name: primer3_in.pl
Description: creates a PRIMER3 input file based on SSR search results
open (IN,"<$ARGV[0]") || die ("\nError: Couldn't open misa.pl results file (*.misa) !\n\n");
my $filename = $ARGV[0];
$filename =~ s/.misa//;
open (SRC,"<$filename") || die ("\nError: Couldn't open source file containing original FASTA sequences !\n\n"); open (OUT,">$filename.p3in");
undef $/;
$in = <IN>;
study $in;
close(IN);
$/= ">";
my $count=0; while (<SRC>) { next unless (my ($id,$seq) = /(.?)\n(.)/s); $seq =~ s/[\d\s>]//g;#remove digits, spaces, line breaks,...
$/="\n"; while(my $line = <IN>) { $line =~ s/\R//g; $id =~ s/\R//g; next unless $line =~ /$id\t(\d+)\t\S+\t\S+\t(\d+)\t(\d+)\t\d+/g; my ($ssr_nr,$size,$start) = ($1,$2,$3);
$count++;
print OUT "SEQUENCE_ID=$id"."_$ssr_nr\nSEQUENCE_TEMPLATE=$seq\n";
print OUT "PRIMER_PRODUCT_SIZE_RANGE=100-280\n";
print OUT "SEQUENCE_TARGET=",$start-3,",",$size+6,"\n";
print OUT "PRIMER_MAX_END_STABILITY=250\n=\n"
};
$/= ">"; }; print "\n$count records created.\n"; close(IN); close(SRC); close(OUT); '''
This script is working good with smaller sequence, but in larger sequences it produces very large input file.
If anyone could help to modify the script so that instead of taking whole chromosome, this script can extract a specified length of flanking sequence around the SSR, it will be helpful.
Thank you..