I'm looking for databases that have intron sequences . I was looking on the NCBI nucleotide database but had little luck .
I'm looking for databases that have intron sequences . I was looking on the NCBI nucleotide database but had little luck .
Years ago I made "GENERECORDS", a filemaker tool that semi-automatically parses Genbank records and extracts CDSs, INTRONS and EXONS by reading the CDS features in each record. Moreover, the tool extracts in distinct databases whatever feature is in the record. Please find here the paper and the software and tutorials for more details. I hope it will be useful at least as a model for your algorithm.
If you parse the CDS position readily available in a GenBank format of the sequence (in NCBI nucleotide database) you can easily calculate the position of the introns as well as the exons !!!
Well you didn't specify which species you need to extract introns. Anyway there is a simple trick in Galaxy where you can use "Extract Features" to convert gene to exon/intron/codon regions. This works only if you have a known gene list in BED format. Alternatively you can look into kent tools.
Good luck!
t=CDS
" in the query fieldPS: you can filter your list by any criteria: size, species, keyword
ex: t=CDS AND sp=homo sapiens
For the Drosophila introns, you can go to FlyBase, click on the species of interest, and select the "all-introns" file under the "Fasta" section.
I don't think there is such a direct way to get intron sequences for worms. A pretty simple method would be to download the annotation file in GFF format, and use the Perl example on the Data Mining page to get the intron sequences (you may have to play around with that code, but it is only a few lines of Perl).
For Arabidopsis, you can go to the Bulk Data Download page, click on "ftp server", click "TAIR10_blastsets" and then just download the file of introns.
http://flybase.org/static_pages/downloads/bulkdata7.html
thanks this is it!
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Typically, introns are calculated, not pre-calculated/stored. Multiple related questions:
Which organism?
any model organism would be fine . i'd prefer planarians or drosophila but when i go to th RefSeq format on NCBI to see where the introns are i cant extract them .