Question

How to create specific .bed files for a panel of genes

3

Entering edit mode

3.5 years ago

K.patel5 ▴ 150

Hello Biostars,

I am very new to genomics and have been given scripts from other bioinformaticians to learn from. Within these scripts they used specific .bed files to analyse a panel of genes and perform annotation.

I understand there are methods of turning a .fasta file or a .bam file into a .bed file. However, I do not understand how to extract information from specific genes e.g. how to create a .bed file which maps all the collagen genes or how to create a .bed file with only exons.

Does anyone know the process of performing such analysis, or know of any databases where .bed files may be stored?

Many thanks, Krutik

panel genomics BED • 5.5k views

ADD COMMENT • link 3.5 years ago by K.patel5 ▴ 150

1

Entering edit mode

3.5 years ago

Florian ▴ 20

Dear K.patel5,

Your question is a bit vague because it is not clear what your bed file contains. Typically, your bam file contains mapped reads, which you can immediately use and convert into a bed file. However, you can also use the bam file for a region counting or differential expression, resulting in a bed or bed like file. So make sure you really understand the content of your bed file.

If you are given (produced) a bed file and now want to filter for specific regions, then you would, for example, apply bedtools intersect. Thus, you would intersect your bedfile with a reference that contains your regions of interest (e.g., exons or annotated genes). The reference file you can get from standard databases, such as UCSC or GENCODE.

I hope that helps.

Cheers, Florian

ADD COMMENT • link 3.5 years ago by Florian ▴ 20

0

Entering edit mode

Thank you for your answer. You are right, I am quite naïve about bed files. Thank you for the information though, and the resources you highlighted seem very helpful.

ADD REPLY • link 3.5 years ago by K.patel5 ▴ 150

score 3 · Accepted Answer · 2022-01-07

After installing BEDOPS, here's a way to get a BED file of genes from a central reference like Gencode:

wget -qO- ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.annotation.gff3.gz \
    | gunzip --stdout - \
    | awk '$3 == "gene"' - \
    | convert2bed -i gff --attribute-key="gene_name" - \
    > genes.bed

Likewise, for exons:

wget -qO- ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.annotation.gff3.gz \
    | gunzip --stdout - \
    | awk '$3 == "exon"' - \
    | convert2bed -i gff --attribute-key="gene_name" - \
    > exons.bed

Via: https://bioinformatics.stackexchange.com/questions/895/how-to-obtain-bed-file-with-coordinates-of-all-genes

(Using the --attribute-key="gene_name" option with convert2bed will bring in HGNC symbol names, which in some contexts can be more commonly used (and useful) for gene names than Ensembl IDs.)

If you want specific genes (say you have a list of gene names or symbols in a file called genes_of_interest.txt, you can use grep:

grep -wfF genes_of_interest.txt genes.bed > genes_of_interest.bed

To do mapping of genes to a set of reads of interest, you can use BEDOPS bam2bed to convert BAM and bedmap to map:

bam2bed < reads.bam > reads.bed
bedmap --echo --echo-map-id-uniq reads.bed genes_of_interest.bed > answer.bed