Entering edit mode
7.5 years ago
Promi
▴
10
Hello,
I have a file containing gi's
103485576
**298489615**
103485741
10348579
And there is a directory
home/documents/all.ptt/_Nostoc_azollae__0708_uid**49725**/**NC_014249**.ptt
home/documents/all.ptt/_Ruminococcus__obeum_uid197165/NC_021022.ptt
Inside the ptt file for example of _Nostoc_azollae__0708_uid49725/NC_014249.ptt
'Nostoc azollae' 0708 chromosome, complete genome - 1..5354700
3589 proteins
Location Strand Length PID Gene Synonym Code COG Product
2007..2231 + 74 **298489615** - Aazo_0002 - - hypothetical protein
2814..2942 + 42 298489616 - Aazo_0003 - - hypothetical protein
Desired ouput is:
gi accession taxid (where gi is the key and accession and taxid are the values or features)
298489615 NC_014249 49725
Can anyone tell me how to read a file of gi's and look for matching value in the PID Gene column of each ptt file looping over mentioned working directory?
Thanks.