How To Get Genome Name And Genome Type With Gi Number And Accession Number In Python?
0
0
Entering edit mode
10.8 years ago
shl198 ▴ 440

Hi all, I have a sam file, and in samfile the reference is in the format like gi|359802265|ref|NC_016434.1|, I want to get the name of this reference, and also the type(DNA or RNA). I looked at Biopython, it seems I have to either use the gi number or the accession number to search, which means I have to separate it first, anyone know any commands can use the format in sam file? And how to extract the type information? Basically, what I want is like this:

result = get_info("gi|359802265|ref|NC_016434.1|")
print result
[[Spodoptera litura granulovirus, complete genome], [DNA]]
biopython • 4.2k views
ADD COMMENT
1
Entering edit mode

Short answer is that you use EUtils: first ELink to get the taxonomy ID from the Entrez taxonomy database using the nucleotide ID, then EFetch to get the species information using the taxonomy ID. BioPython has modules for this.

I'll add a longer answer when I have time, unless someone else gets there first.

ADD REPLY
0
Entering edit mode

Could just use an EFetch to get the RefSeq entry and extract the sequence type,and description from there using SeqIO. Alternatively you could derive the sequence type from the RefSeq accession (see RefSeq accession numbers and molecule types).

ADD REPLY
0
Entering edit mode

True, assuming all the references are RefSeq IDs.

ADD REPLY
0
Entering edit mode

What has the SAM/BAM format got to do with this?

ADD REPLY

Login before adding your answer.

Traffic: 1635 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6