I have a BAC that I am trying to annotate. I have the full sequence and a list of genes that exist on both +ve and -ve strands, but I am trying to figure out a way to annotate (or perhaps retrieve a fully annotated) the entire sequence. I know this could potential be done manually, but it is extremely laborious. The ID for this BAC sequence is: RP11-71E16
I have never done this before, so I just googled my BAC sequence and opened it in the NCBI clone database:
This is just a screen shot of what I am seeing, I know there are about 25 genes at several thousand bp (including introns) - but I haven't figured out a way to download a fully annotated GenBank file. Any ideas?
If you know where the BAC is aligned to the reference, then can't you just download the GTF file, extract the appropriate section with awk and and then modify the coordinates to be relative to the start of the BAC? That'd be simpler than dealing with a GenBank file (though I presume you could perform a similar procedure on the chr14 Genbank file with bioperl or biopython).