make a BED file with exons only
2
0
Entering edit mode
2.7 years ago

How can I make a BED file with exons only?

I have followed this instructions found in the forum:

1-Go to the UCSC table browser.
   2-Select desired species and assembly
   3-Select group: Genes and Gene Prediction Tracks
   4-Select track: UCSC Genes (or Refseq, Ensembl, etc.)
   5-Select table: knownGene
   6-Select region: genome (or you can test on a single chromosome or    smaller region)
   7-Select output format: BED - browser extensible data
   8-Enter output file: UCSC_Introns.tsv
   9-Select file type returned: gzip compressed
   10-Hit the 'get output' button
   11-A second page of options relating to the BED file will appear.
   12-Under 'create one BED record per:'. Select 'Exons plus'
   13-Add desired flank for exons being returned, or leave as 0 to get    just the exons.
   14-Hit the 'get BED' option*

But in my BED file I found also introns intervals. Does someone know how it occurs?**

exons exon BED • 2.2k views
ADD COMMENT
0
Entering edit mode

I found also introns intervals.

introns from alternative transcripts ?

ADD REPLY
0
Entering edit mode

I don't think so, they seem to be not related with any exon

ADD REPLY
1
Entering edit mode
2.6 years ago
maddy.yellow ▴ 10

I followed the same steps as you and had the same problem (except I am using hg38 with GENCODE V39, and I was downloading 3' UTR exons). I found that there were supposedly 3' UTR exons all over the FUS gene (chr16:31,180,139-31,191,605), which didn't make sense looking at these transcripts that I was expecting: FUS_BasicGeneAnnotations

After changing the GENCODE V39 track settings to "All" instead of "BASIC only", I saw all of these transcripts: FUS_AllGeneAnnotations

The 3' UTR exons in my .bed file that I didn't want were from all of these alternative transcripts (as Pierre Lindenbaum suggested). The Table Browser defaults to "All" GENCODE V39 transcripts. I was able to add a filter in the Table Browser (as Matthias Zepper suggested) to get output for only the transcripts I wanted. (I also found the table schemas for knownGene and knownAttrs to be useful in creating my filter.)

ADD COMMENT
0
Entering edit mode
2.7 years ago

If I may ask: How did you determine that there are also introns present in the output file? Do you already have a file with known intron coordinates that you intersected? In that case, you might need to merge both bedfiles first before doing so.

My first steps to troubleshoot would be:

  • Double check that my positions refer to the same reference genome build and also that I am using a consistent annotation source (Refseq, Ensembl, UCSC etc.)
  • Investigate selected regions visually in the Genome Browser.
ADD COMMENT
0
Entering edit mode

Because I have a VCF and I wanted to select only the variants in the exoms, so I used bedtools to intersect my VCF with the BED file. The list of variants I got from the bedtools command wasn't very long so I decided to check one by one my variants with NCBI variation viewer and gnomad database and I found introns snps.

In my BED file I have for example: chr 1 146310551-146334210 exon

And in my VCF : chr1 146325842 , on gnomad this variant is considered intronic

ADD REPLY
1
Entering edit mode

Assuming that your coordinates refer to hg38 and you used the Gencode V39 annotation, the exons of two transcripts (ENST00000618406.1, ENST00000643391.1) from the pseudogene SEC22B4P will be annotated in this region. So everything works as expected. If you are only interested in coding genes, have a look at the FAQ or use the filter functionality of the table browser.

ADD REPLY
0
Entering edit mode

I use hg19 GRCh37.p13

I can see no exons in this position

I can see no exons in this position

ADD REPLY
1
Entering edit mode

Why didn't you directly use the UCSC Genome Browser, if you are relying the corresponding table browser to download?

Evidently, the UCSC Genes Track annotates an exon there: a HYDIN2 transcript with the ID uc031poh.1. Of course, I can't vouch for the correctness of this annotation (it indeed seems that the exon is gone in more recent tracks like Gencode 39), but an exon is indeed contained in the track that you downloaded as .bed file.

You should also be able to find the ID of overlapping exons when you intersect the two files in the switched order. So bioinformatically everything was done correctly and now it is just a matter of selecting the correct track that best corresponds with your biological data.

UCSC Genome Browser View

ADD REPLY

Login before adding your answer.

Traffic: 1492 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6