Dear all
I have a couple of metagenomes that I have mapped against the IMG/VR Viral database . Most of my reads have hits within the Uncultivated Viral Genome (UViG) IDs (list) and therefore in my results file I have a long list of IDs that look like that:
IMGVR_UViG_3300008486_000101
IMGVR_UViG_2519103086_000001
IMGVR_UViG_2519103159_000003
IMGVR_UViG_2526164598_000001
IMGVR_UViG_2534681965_000001
IMGVR_UViG_3300018494_000062
IMGVR_UViG_3300018878_000007
IMGVR_UViG_3300018878_000008
IMGVR_UViG_3300018878_000079
IMGVR_UViG_3300019376_000005
IMGVR_UViG_3300019378_000008
IMGVR_UViG_3300021255_000014
IMGVR_UViG_3300021255_000002
IMGVR_UViG_3300021255_000040
where I believe that the first part is the organism (e.g. "IMGVR_UViG_3300008486") and the second part the gene (e.g. "_000101"). Using that ID I can recover all the meaningful information from the IMG website https://img.jgi.doe.gov/cgi-bin/vr/main.cgi?section=ViralSearch&option=uvig
But in my case I have a hundred files with tens of thousands of IDs (just this time, in the future I will have even more) - and therefore the website is not an option.
I know that poeple have used python scripts to recover IDs automatically from NCBI but I have not seen such scripts for JGI and on top of that I do not have the skills to do this on my own so any help would be very much appreciated cause eitherwise these results are completely unusable to me...
Thanks in advance P I