Bed File With Introns Only
4
9
Entering edit mode
13.2 years ago
Pfs ▴ 580

How can I make a BED (or other format) file with introns only, starting with the GTF (or similar) file?

Thanks in advance.

bed intron ucsc browser • 24k views
ADD COMMENT
0
Entering edit mode

also see responses to this question

ADD REPLY
0
Entering edit mode

Thank you for the answer

ADD REPLY
40
Entering edit mode
13.1 years ago

Following is a set of detailed instructions on how to get a BED file of all introns from the UCSC table browser. Note that most of the following options will be set by default. So the number of steps required is not as bad as it seems

  1. Go to the UCSC table browser.
  2. Select desired species and assembly
  3. Select group: Genes and Gene Prediction Tracks
  4. Select track: UCSC Genes (or Refseq, Ensembl, etc.)
  5. Select table: knownGene
  6. Select region: genome (or you can test on a single chromosome or smaller region)
  7. Select output format: BED - browser extensible data
  8. Enter output file: UCSC_Introns.tsv
  9. Select file type returned: gzip compressed
  10. Hit the 'get output' button
  11. A second page of options relating to the BED file will appear.
  12. Under 'create one BED record per:'. Select 'Introns plus'
  13. Add desired flank for introns being returned, or leave as 0 to get just the introns
  14. Hit the 'get BED' option

You will get output that looks like this for every UCSC gene:


chr3    124449474    124453939    uc003ehl.3_intron_0_0_chr3_124449475_f    0    +
chr3    124454093    124456414    uc003ehl.3_intron_1_0_chr3_124454094_f    0    +
chr3    124457086    124458870    uc003ehl.3_intron_2_0_chr3_124457087_f    0    +
chr3    124459046    124460998    uc003ehl.3_intron_3_0_chr3_124459047_f    0    +
chr3    124461113    124462761    uc003ehl.3_intron_4_0_chr3_124461114_f    0    +

As a sanity check you can go back to the UCSC genome browser, select add custom tracks, paste in some of your BED data (such as the block above corresponding to the human gene UMPS on hg19), hit 'submit', and then go to genome browser. The result should look something like this:


alt text

ADD COMMENT
1
Entering edit mode

This doesn't answer how to convert a given GTF file.

ADD REPLY
0
Entering edit mode

This is very useful. For some reason, this worked for UCSC and Refseq genes but not for Ensembl. Any suggestions? Thanks!

ADD REPLY
0
Entering edit mode

Hi, I've followed your instructions because I need to obtain the intervals of EXONES. So the only step I changed is "Under 'create one BED record per:'. Select 'Introns plus'" and I selected 'exons plus', but in my file I've found also introns intervals. Do you know how this occur?

ADD REPLY
1
Entering edit mode
12.2 years ago
biorepine ★ 1.5k
  1. convert gtf to bed using this script https://gist.github.com/1155568

  2. convert bed to either exons or introns using this script https://gireeshkumarbogu.wordpress.com/data-scripts/

ADD COMMENT
1
Entering edit mode
9.7 years ago
Xianjun ▴ 310

Here is an easy example code to convert bed12 --> intron, 5' UTR, 3' UTR, CDS etc.

http://onetipperday.blogspot.com/2012/11/get-intron-utr-cds-from-bed12-format.html

If you want to get meta-intron (i.e. merge overlapped introns from one gene into one intron), you can use the code snip below:

cat exons.meta.bed | \
  sort -k4,4 -k2,2n | \
  awk '{
    OFS="\t"; 
    if($4!=id) 
    {
      if(e!="") print chr,s,e,id,1,str; chr=$1;s=$3;id=$4;str=$6;e="";
    } 
    else 
    {
      e=$2;
      print chr,s,e,id,1,str;s=$3;e="";
    }
  }
  END {
    if(e!=&"") print chr,s,e,id,1,str;
  }' > introns.meta.bed

where exons.meta.bed is in a bed6 format with gene_ID (e.g. ENSGxxxx) as name.

ADD COMMENT
0
Entering edit mode
13.2 years ago
Chuangye ▴ 80

If you have known the organism, please use the "Table" utilities of UCSC genome browser.

ADD COMMENT
0
Entering edit mode

I looked at it but I can download a BED file with the exons information. Are you suggesting that I perfom some kind of set-complement operation, where I remove the exon segments from the gene segment? I assume it would work, but I was hoping for a ready-made solution. Thanks!

ADD REPLY
0
Entering edit mode

In the UCSC Genome browser's table browser, if you select any gene type track, you should use the "Introns plus X bases" option on the form which follows clicking "Get output".

ADD REPLY

Login before adding your answer.

Traffic: 1578 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6