You may skip to the question part if it seems clear enough without the context.
I'm trying to automate the process of extracting specific info from BLAST searches.
When doing it manually, I would blastn some sequences against the nt database, and record the following info of the top hit: the query coverage, the percent identity, start of the aligned range (hit_start), and the gene product of the first CDS.
The first two would be on the result page, and the last two would require clicking on the hit to see the range first, and then open the GenBank info to find the CDS info.
Since I need to go through a large amount of sequences, I'd like to automate this process. I managed to get everything with Biopython except for the product name, I guess it's because the other attributes could be extracted from the blast results, but the genbank info is really independent from the blast results.
So my question now is: if I obtained the accession id or gi number from the blast results, as well as the range of alignment, i.e. the location within the gene, is there a way to somehow connect to genbank or other databases to extract the first gene product name within or overlapping this range?
I hope I explained myself clearly... I've been trying to rearrange keywords to look for a solution or part of a solution to this problem but haven't been successful.
Thank you in advance.
Edit: More specifically, my immediate question would be: is there a way to extract GenBank info on a specific region instead of pulling the entire file? For example, on the BLAST result page, if I click on the GenBank info of a hit, it would specify that this is a partial Genbank file for: e.g. ACCESSION CP009256 REGION: 3404381..3405501. When I do it in python, is there an option for retrieving a partial file like this, and then look for the product name in CDS?
If you have the genbank format file then use code provided by @cpad0112 here : C: How to extract certain CDS from a GenBank file in linux terminal
Otherwise provide an example accession number and CDS you want to see.
Thank you. I guess I'll first try to figure out how to get the genbank files, and then try to filter through the CDS's based on their position to get the one I want.
Also, please see my edited note: If I would like to retrieve a partial GenBank file for a specific region, e.g. ACCESSION CP009256 REGION: 3404381..3405501, is this accomplishable? I also have the gi number if that's more feasible.