Hello friends,
I am new to perl programming still I have to practice Regular expression and NCBI file handling but here I have a task to do I have done half cud anybody help doing the rest
File 1:
Candida glabrata CBS 138 chromosome D, complete genome - 1..651701
283 proteins
Location Strand Length PID Gene Synonym Code COG Product
17042..17914 - 290 50285983 - CAGL0D00154g - -
23693..25075 + 460 50285985 - CAGL0D00176g - -
27559..28710 + 383 50285987 - CAGL0D00198g - -
29345..29914 + 189 50285989 - CAGL0D00220g - -
So on 40 lines.....
File 2: Contains
>ref|NC_006027.1|:c17914-17042 hypothetical protein [Candida glabrata CBS 138]
ATGGAAACAGAACATCAGGCAGACAAAAATGCGGAATTGGGTTATGACAGTGGATCAACCGTTGCTCCCC
CCAATAAATATAGTACATTACGCTCTAGGTTCAATTTAGGACCTGACACTATGAGAAATCATGTTATTGC
CTTTTTTGGGGAGTTGGTTGGCACATTCATGTTTTTATGGTGTGCCTATGTTATTGCAAATATTGCAAAT
>ref|NC_006027.1|:23693-25075 hypothetical protein [Candida glabrata CBS 138]
ATGTCTTCTCAAGTTAACGAACCAGAATTTCAACAAGCTTACCACGAAGTTGTTTCCTCTTTGAAGGACT
CTTCTTTGTTCGAAAAGCACCCAAAATATGCTAAGGTTCTTCCAGTTGTCTCTGTCCCAGAGAGAATCAT
so on number of locations in file 1 is equal to no. of Seq in File 2..
here is what I have to do if the location of FILE 1 i.e "17042..17914" matches with the Header of the FILE 2 i.e "c17914-17042 match with either upper or the lower limit
then it should remove header of fasta of file 2 then insert">CAGL0D00154g" which is in synonym column of File 1 , location with the corresponding synonym
then my Output file should come as follows:
File3:
>CAGL0D00154g
ATGGAAACAGAACATCAGGCAGACAAAAATGCGGAATTGGGTTATGACAGTGGATCAACCGTTGCTCCCC
CCAATAAATATAGTACATTACGCTCTAGGTTCAATTTAGGACCTGACACTATGAGAAATCATGTTATTGC
CTTTTTTGGGGAGTTGGTTGGCACATTCATGTTTTTATGGTGTGCCTATGTTATTGCAAATATTGCAAAT
>CAGL0D00176g
ATGTCTTCTCAAGTTAACGAACCAGAATTTCAACAAGCTTACCACGAAGTTGTTTCCTCTTTGAAGGACT
CTTCTTTGTTCGAAAAGCACCCAAAATATGCTAAGGTTCTTCCAGTTGTCTCTGTCCCAGAGAGAATCAT
`
Here is what I have done
foreach $line(@File1){
chomp($line);
($f1,$f2,$f3,$f4,$f5,$f6)=split (/\t+/,$line);
push(@F1,$f1);
push(@F2,$f2);
so on... }
@F1 contains Locations colunm(17042..17914,,) @F6 contains Synonym column (CAGL0D00176g)
same way I collected the all the upper limit of location of File 2 i.e(17914,25075,,) @B using
foreach $line(@File2){
chomp $line;
if ($line=~/\-(\d*)/){
}
So could anybody help/write code to get output as I specified above
Looking forward for your code
Thank you
Please can you reformat this question to make it readable. As it stands, no-one is likely to answer because it is almost unintelligible.
...and improve the spelling, grammar, etc...