make a BED file with exons only
2
2
Entering edit mode
2.6 years ago

I made a BED file with only exons (hg19) using UCSC table browser (https://genome.ucsc.edu/cgi-bin/hgTables).

But in my file I've found this regions, for example:

chr 1 146310551-146334210

chr 7 9229189-9229487

chr 14 76666574-76669131

chr 17 41341930-41342793

chr 21 19935632-19935690

Do you recognize them as exons? Because they seem to me to be intronic sequences. What can it be the problem?

vcf exons exon BED • 3.7k views
ADD COMMENT
0
Entering edit mode

What makes you think they are intronic regions? Did you search them into a Genome Browser?

ADD REPLY
0
Entering edit mode

Yes, I used NCBI Variation Viewer.

I asked because maybe I couldn't interpretate well the data on the genome browser.

For example this is what I get for the first interval:

enter image description here

ADD REPLY
0
Entering edit mode

The first interval span is about 23K bp, a bit big for an exon if you ask me. If you want to make a bed file containing the exon coordinates of the human genome, I will recommend you to download the GTF file, extract the exon lines, and convert it to bed format.

ADD REPLY
0
Entering edit mode

Does that file refer to hg19? I'm working with that reference

ADD REPLY
3
Entering edit mode
2.6 years ago
Rafael Soler ★ 1.3k

You can download the hg19 GTF, for example here: https://www.gencodegenes.org/human/release_19.html

And after this, select only the lines with the word "exon" and extract the coordinates:

grep exon gencode.v19.chr_patch_hapl_scaff.annotation.gtf | cut -f1,4,5 > Human_exons.gtf

Also, you can add "sort -u" to remove duplicate exons

grep exon gencode.v19.chr_patch_hapl_scaff.annotation.gtf | cut -f1,4,5 | sort -u > Human_unique_exons.gtf

Also if you are not working with Linux or a Command Line, you can download them from Biomart https://www.ensembl.org/biomart/martview/0f2873b6b31d9a0aaf1e8f3122bdae0f

I hope it helps! :)

ADD COMMENT
3
Entering edit mode

Remember that a BED file is 0-based while a GTF is 1-based coordinate system.

ADD REPLY
0
Entering edit mode

True, however note the question is tagged "vcf" so I assume OP wants to filter a VCF file using the BED. In such case OP can use bcftools which will accept a tab-delimited file with 1-based coordinates, such as the one produced by this answer ;) Just make sure it has a different file suffix than .bed (I usually use .tsv or .tab).

As a side note - with many small regions (such as exons), using e.g. bcftools view -T filter.bed is much much faster than bcftools view -R filter.bed, which is better suited for filtering a few larger regions. The -T flag will also let you invert the filter with ^.

ADD REPLY
1
Entering edit mode

Thank you very much!

ADD REPLY
0
Entering edit mode
2.6 years ago
elisheva ▴ 120

You can upload a list of gene ids to the table browser, select the output format as BED and you will see an option to download only exons/introns related to those genes.

ADD COMMENT

Login before adding your answer.

Traffic: 2276 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6