Question

Download targeted sequences with certain GI number, start position and end position

0

Entering edit mode

7.2 years ago

horsedog ▴ 60

Hi, all, I need a lot of bacterial sequences from NCBI, and I have the GI number, start position and end position of each sequences I want. I'm wondering is it possible to only download the targeted sequences instead of the whole genome? I used the batch entrez before but it will give me the whole genome which I don't need. Thank you

sequence • 1.6k views

ADD COMMENT • link 7.2 years ago by horsedog ▴ 60

score 0 · Answer 1 · 2017-10-18

0

Entering edit mode

7.2 years ago

GenoMax 148k

NCBI eUtils would be the way to go. Can post post an example gi and region you need. BTW: NCBI stopped using gi's externally a while ago.

ADD COMMENT • link 7.2 years ago by GenoMax 148k

0

Entering edit mode

I'm sorry, could you please specify it a bit? Like how to introduce the start position and end position

ADD REPLY • link 7.2 years ago by horsedog ▴ 60

0

Entering edit mode

For example:

$ efetch -db nuccore -format fasta -id CP005986 -chr_start 1600000 -chr_stop 1600020 brings back a 20 bp chunk from this genome.

>CP005986.1:1600001-1600021 Acidithiobacillus caldus ATCC 51756, complete genome
ACGAGCGGCGCATTACTCCGA

BTW: CP005986 can be replaced by the gi number 640840007 to get the same result.

ADD REPLY • link 7.2 years ago by GenoMax 148k

0

Entering edit mode

Oh! thank you very much, it's really amazing. But what if I have a batch of sequences want to extract, here I tried to save all the CP number, start position and end position in three different txt files, and I run: efetch -db nuccore -format fasta -id name.txt -chr_start start.txt -chr_stop end.txt, but it doesn't work.

ADD REPLY • link 7.2 years ago by horsedog ▴ 60