How To Find Intron Sequences From Ncbi Records?
4
2
Entering edit mode
12.2 years ago

How can I find the the intron sequences from the databases available through NCBI?

At one point I found a database (FlyBase) for drosophila intron sequences but it doesn't seem to be available anymore.

I want them for humans and mouse if that is possible

intron sequence ncbi • 20k views
ADD COMMENT
0
Entering edit mode

what kind of record ? a genebank record ? a gene record ? or do you just want a place to download the sequence of those introns (which organism ?)

ADD REPLY
1
Entering edit mode
12.2 years ago

Well, you did post a very similar question in the past Database to find intron sequences? and you got a good number of answers for it. I would recommend reading through those and following the links that are specified there.

The short answer to your question is that you cannot directly download intronic sequences from NCBI. What you can do is post process the files that you get from NCBI and extract that information. If you want to know how to do that you should open a new specific question along the way: How to extract introns from a genbank file?

ADD COMMENT
0
Entering edit mode

ya the other question helped but a lot of the links for software were not working or not compatible with the newer mac series that do not run on PowerPC

ADD REPLY
1
Entering edit mode
12.2 years ago
Rm 8.3k
  1. Generate intron bed as described here from the UCSC. (you can use refseq introns if required)
  2. Download the reference genome.
  3. Use nucBed from bedtools to get the fasta sequences:

    nucBed -s -fi Homo_sapiens.GRCh37.62.fa -bed hg19.introns.bed -seq | awk '(NR>1){print ">"$4 "| "$1":"$2"-"$3"\n" $16}' > hg19.introns.fasta

Output:

>uc001aaa.3_intron_0_0_1_12228_f| 1:12227-12612
GTAAGTAGTGCTTGTGCTCATCTCCTTGGCTGTGATACGTGGCCGGCCCTCGCTCCAGCAGCTGGACCCCTACCTGCCGTCTGCTGCCATCGGAGCCCAAAGCCGGGCTGTGACTGCTCAGACCAGCCGGCTGGAGGGAGGGGCTCAGCAGGTCTGGCTTTGGCCCTGGGAGAGCAGGTGGAAGATCAGGCAGGCCATCGCTGCCA
CAGAACCCAGTGGATTGGCCTAGGTGGGATCTCTGAGCTCAACAAGCCCTCTCTGGGTGGTAGGTGCAGAGACGGGAGGGGCAGAGCCGCAGGCACAGCCAAGAGGGCTGAAGAAATGGTAGAACGGAGCAGCTGGTGATGTGTGGGCCCACCGGCCCCAGGCTCCTGTCTCCCCCCAG
>uc001aaa.3_intron_1_0_1_12722_f| 1:12721-13220
GTGAGAGGAGAGTAGACAGTGAGTGGGAGTGGCGTCGCCCCTAGGGCTCTACGGGGCCGGCGTCTCCTGTCTCCTGGAGAGGCTTCGATGCCCCTCCACACCCTCTTGATCTTCCCTGTGATGTCATCTGGAGCCCTGCTGCTTGCGGTGGCCTATAAAGCCTCCTAGTCTGGCTCCAAGGCCTGGCAGAGTCTTTCCCAGGGAAA
GCTACAAGCAGCAAACAGTCTGCATGGGTCATCCCCTTCACTCCCAGCTCAGAGCCCAGGCCAGGGGCCCCCAAGAAAGGCTCTGGTGGAGAACCTGTGCATGAAGGCTGTCAACCAGTCCATAGGCAAGCCTGGCTGCCTCCAGCTGGGTCGACAGACAGGGGCTGGAGAAGGGGAGAAGAGGAAAGTGAGGTTGCCTGCCCTGT
CTCCTACCTGAGGCTGAGGAAGGAGAAGGGGATGCACTGTTGGGGAGGCAGCTGTAACTCAAAGCCTTAGCCTCTGTTCCCACGAAG
>uc010nxr.1_intron_0_0_1_12228_f| 1:12227-12645
GTAAGTAGTGCTTGTGCTCATCTCCTTGGCTGTGATACGTGGCCGGCCCTCGCTCCAGCAGCTGGACCCCTACCTGCCGTCTGCTGCCATCGGAGCCCAAAGCCGGGCTGTGACTGCTCAGACCAGCCGGCTGGAGGGAGGGGCTCAGCAGGTCTGGCTTTGGCCCTGGGAGAGCAGGTGGAAGATCAGGCAGGCCATCGCTGCCA
CAGAACCCAGTGGATTGGCCTAGGTGGGATCTCTGAGCTCAACAAGCCCTCTCTGGGTGGTAGGTGCAGAGACGGGAGGGGCAGAGCCGCAGGCACAGCCAAGAGGGCTGAAGAAATGGTAGAACGGAGCAGCTGGTGATGTGTGGGCCCACCGGCCCCAGGCTCCTGTCTCCCCCCAGGTGTGTGGTGATGCCAGGCATGCCCTT
CCCCAG
>uc010nxr.1_intron_1_0_1_12698_f| 1:12697-13220
GTGAGTGTCCCCAGTGTTGCAGAGGTGAGAGGAGAGTAGACAGTGAGTGGGAGTGGCGTCGCCCCTAGGGCTCTACGGGGCCGGCGTCTCCTGTCTCCTGGAGAGGCTTCGATGCCCCTCCACACCCTCTTGATCTTCCCTGTGATGTCATCTGGAGCCCTGCTGCTTGCGGTGGCCTATAAAGCCTCCTAGTCTGGCTCCAAGGC
CTGGCAGAGTCTTTCCCAGGGAAAGCTACAAGCAGCAAACAGTCTGCATGGGTCATCCCCTTCACTCCCAGCTCAGAGCCCAGGCCAGGGGCCCCCAAGAAAGGCTCTGGTGGAGAACCTGTGCATGAAGGCTGTCAACCAGTCCATAGGCAAGCCTGGCTGCCTCCAGCTGGGTCGACAGACAGGGGCTGGAGAAGGGGAGAAGA
GGAAAGTGAGGTTGCCTGCCCTGTCTCCTACCTGAGGCTGAGGAAGGAGAAGGGGATGCACTGTTGGGGAGGCAGCTGTAACTCAAAGCCTTAGCCTCTGTTCCCACGAAG
(...)
ADD COMMENT
1
Entering edit mode
12.2 years ago
deanna.church ★ 1.1k

GFF3 files are now available for the latest NCBI annotation. Here is the link to the human file: ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/GFF/

and the mouse file: ftp://ftp.ncbi.nlm.nih.gov/genomes/M_musculus/GFF/

ADD COMMENT
0
Entering edit mode
12.2 years ago

If you know the gene name or accession, the UCSC enter link description here can help you, beside the "identifiers" label, open the "paste list ", fill you gene name or accession, and click the " get output" button, select the relate button, you can get an satisfying result.

ADD COMMENT

Login before adding your answer.

Traffic: 2107 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6