Question

How to extract variable nucleotide regions from a list of contigs

0

Entering edit mode

7.1 years ago

kayrouz.1 • 0

I have a list of about 6000 NCBI contig accession numbers and I'd like to extract a specific 30kb region from each contig. I have, in a separate file, a list of "begin" and "end" indices that represent the region of interest for each contig. Is there a way to retrieve a fasta file of these trimmed contigs using an Entrez query? Given that I'm not a very skilled programmer, I would have just put the sequences in an excel spreadsheet and trimmed accordingly, but the sequence strings are too long to fit in an excel cell. Is there a simple way to do this via E-Utilities?

blast contig extract refseq • 1.8k views

ADD COMMENT • link updated 7.1 years ago by lakhujanivijay 5.9k • written 7.1 years ago by kayrouz.1 • 0

0

Entering edit mode

Do you have a reference genome? If you do you could use bedtools getfasta.

ADD REPLY • link 7.1 years ago by Sinji ★ 3.2k

score 0 · Answer 1 · 2017-12-16

0

Entering edit mode

7.1 years ago

lakhujanivijay 5.9k

To the point explanation of bedtools getfasta is here

ADD COMMENT • link 7.1 years ago by lakhujanivijay 5.9k