Question

Getting sequence from any fasta based on coordinates

0

Entering edit mode

3.9 years ago

agata ▴ 10

I have received coordinates of several genes (not annotated) and was said the origin is TAIR10. I wanted to extract these sequences based only on this information, but encounter several doubts. I know it seems trivial but I am curious whether there are some new approaches towards it.

(1) I wanted to use blastdbcmd, but I miss the "-entry sequence_identifier"...

(2) Then I looked for help here and saw the post about getSeq from R (link), but it seems that the TAIR10 is not available in the package.

(3) The third idea is to go to NCBI, display TAIR10 genome, copy-paste the coordinates in the genome data viewer, adjust the range one more time and finally download fasta...

(4) I saw also the solution with using samtools (faidx) (link). But I wonder, if I download the TAIR10 genome and index it with samtools, would the index fit to coordinates in annotations?

Is there some web-tool that could solve that? Or any other tool?

fasta • 1.1k views

ADD COMMENT • link updated 3.9 years ago by Juke34 9.3k • written 3.9 years ago by agata ▴ 10

0

Entering edit mode

I wanted to use blastdbcmd, but I miss the "-entry sequence_identifier"...

You will need to create the database with --parse-seqid option. The headers will need to be in a certain format.

NCBI, display TAIR10 genome, copy-paste the coordinates in the genome data viewer, adjust the range one more time and finally download fasta...

You may be able to use Entrezdirect on the command line. If you can provide some example intervals.

But I wonder, if I download the TAIR10 genome and index it with samtools, would the index fit to coordinates in annotations?

As long as your sequence and annotations came from the same source they would. And samtools faidx would indeed be a good solution for this.

ADD REPLY • link 3.9 years ago by GenoMax 154k

score 0 · Answer 1 · 2021-12-09

0

Entering edit mode

3.9 years ago

Juke34 9.3k

install GAAS: conda install -n gaas gaas

gaas_fasta_domain_extractor.pl -i <input fasta file> -n <sequence name> -s <start_coordinate> -e <end_coordinate> [-o <output file>]

ADD COMMENT • link 3.9 years ago by Juke34 9.3k