GI chromosome list ncbi
1
Hi,
I am using Entrez.efetch
in BioPython to fetch sequences from genomic coordinates (chr, start, stop, strand):
handle = Entrez.efetch(db="nucleotide",
id=[GI number],
rettype="fasta",
strand=strand,
seq_start=start,
seq_stop=stop)
record = SeqIO.read(handle,"fasta")
handle.close()
print record.seq
The package requires the GI of the chromosomes, but I don't know how to find these. Tried to manually search in the nucleotide database in ncbi, but this gives a lot of different GIs for each chromosome.
I am looking for the most recent mus musculus genomic sequences.
Any suggestions how to get these GIs or another way to circumvent this problem?
Best
Per
genome
sequence
• 2.8k views
•
link
updated 2.1 years ago by
Ram
44k
•
written 9.3 years ago by
TEman
▴
10
One way to do this is to grab the RefSeq ID's for the chromosomes from the genome page: http://www.ncbi.nlm.nih.gov/genome/52
NC_000067.6
NC_000068.7
NC_000069.6
NC_000070.6
NC_000071.6
NC_000072.6
NC_000073.6
NC_000074.6
NC_000075.6
NC_000076.6
NC_000077.6
NC_000078.6
NC_000079.6
NC_000080.6
NC_000081.6
NC_000082.6
NC_000083.6
NC_000084.6
NC_000085.6
NC_000086.7
NC_000087.7
NC_005089.1
Then use the blastdbcmd
utility along with the blast indexes for refseq_genomic blast indexes
$ blastdbcmd -entry_batch file_with_refseq_ID -db /path_to/refseq_genomic -outfmt "%g"
to get the GI #
372099109
372099108
372099107
372099106
372099105
372099104
372099103
372099102
372099101
372099100
372099099
372099098
372099097
372099096
372099095
372099094
372099093
372099092
372099091
372099090
372099089
34538597
•
link
updated 2.1 years ago by
Ram
44k
•
written 9.3 years ago by
GenoMax
147k
Login before adding your answer.
Traffic: 1560 users visited in the last hour