I took the sequences from a FASTA file and concatenated them to form one big sequence, which was the basis of the my research. I now have a series of coordinates (inside this concatenated sequence) that I am interested in.
I want to be able to find the original ID's of the sequences that match with the coordinates inside this concatenated sequence. I am currently writing a Perl script, does anyone have any suggestions?
#! /usr/bin/perl -w
use strict;
use Cwd;
my $input = $ARGV[0];
open (my $INPUT, "<$input") or die "unable to open $input";
while (<INPUT>) {
if( /^[AGCT]/ {
}
}
close $input;
Obviously my program isn't finishee, but i think i will try the Length function inside Perl and assign those to an array.
Showing some example input, output would be helpful.
Input would be a standard Fasta file, and a file with coordinates in two columns (start-stop). Output would be a list of ID's that match to a set of coordinates i input