How can I make a BED (or other format) file with introns only, starting with the GTF (or similar) file?
Thanks in advance.
How can I make a BED (or other format) file with introns only, starting with the GTF (or similar) file?
Thanks in advance.
Following is a set of detailed instructions on how to get a BED file of all introns from the UCSC table browser. Note that most of the following options will be set by default. So the number of steps required is not as bad as it seems
You will get output that looks like this for every UCSC gene:
chr3 124449474 124453939 uc003ehl.3_intron_0_0_chr3_124449475_f 0 + chr3 124454093 124456414 uc003ehl.3_intron_1_0_chr3_124454094_f 0 + chr3 124457086 124458870 uc003ehl.3_intron_2_0_chr3_124457087_f 0 + chr3 124459046 124460998 uc003ehl.3_intron_3_0_chr3_124459047_f 0 + chr3 124461113 124462761 uc003ehl.3_intron_4_0_chr3_124461114_f 0 +
As a sanity check you can go back to the UCSC genome browser, select add custom tracks, paste in some of your BED data (such as the block above corresponding to the human gene UMPS on hg19), hit 'submit', and then go to genome browser. The result should look something like this:
Hi, I've followed your instructions because I need to obtain the intervals of EXONES. So the only step I changed is "Under 'create one BED record per:'. Select 'Introns plus'" and I selected 'exons plus', but in my file I've found also introns intervals. Do you know how this occur?
convert gtf to bed using this script https://gist.github.com/1155568
convert bed to either exons or introns using this script https://gireeshkumarbogu.wordpress.com/data-scripts/
Here is an easy example code to convert bed12 --> intron, 5' UTR, 3' UTR, CDS etc.
http://onetipperday.blogspot.com/2012/11/get-intron-utr-cds-from-bed12-format.html
If you want to get meta-intron (i.e. merge overlapped introns from one gene into one intron), you can use the code snip below:
cat exons.meta.bed | \
sort -k4,4 -k2,2n | \
awk '{
OFS="\t";
if($4!=id)
{
if(e!="") print chr,s,e,id,1,str; chr=$1;s=$3;id=$4;str=$6;e="";
}
else
{
e=$2;
print chr,s,e,id,1,str;s=$3;e="";
}
}
END {
if(e!=&"") print chr,s,e,id,1,str;
}' > introns.meta.bed
where exons.meta.bed
is in a bed6 format with gene_ID
(e.g. ENSGxxxx) as name.
If you have known the organism, please use the "Table" utilities of UCSC genome browser.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
also see responses to this question
Thank you for the answer