I have received coordinates of several genes (not annotated) and was said the origin is TAIR10. I wanted to extract these sequences based only on this information, but encounter several doubts. I know it seems trivial but I am curious whether there are some new approaches towards it.
(1) I wanted to use blastdbcmd, but I miss the "-entry sequence_identifier"...
(2) Then I looked for help here and saw the post about getSeq from R (link), but it seems that the TAIR10 is not available in the package.
(3) The third idea is to go to NCBI, display TAIR10 genome, copy-paste the coordinates in the genome data viewer, adjust the range one more time and finally download fasta...
(4) I saw also the solution with using samtools (faidx) (link). But I wonder, if I download the TAIR10 genome and index it with samtools, would the index fit to coordinates in annotations?
Is there some web-tool that could solve that? Or any other tool?
You will need to create the database with
--parse-seqid
option. The headers will need to be in a certain format.You may be able to use Entrezdirect on the command line. If you can provide some example intervals.
As long as your sequence and annotations came from the same source they would. And
samtools faidx
would indeed be a good solution for this.