Question

How to add transcript and cpg site information to a bed file

0

Entering edit mode

2.6 years ago

K.patel5 ▴ 150

Dear Biostars,

I have a particular type of bed file created using ngs-bits in hopes of using the ClinCNV tool. Unfortunately, I am unable to use their in-house annotation method because my hpc will not give me permission to use mysql as root -- long story but I just need to brainstorm other ways of annotating my bed file.

The particular bed file looks as so (chromosomes, start, end, GC). Sequences have been binned by 2000 nucleotides. Also the start and end numbers restart for each chromosome.

chr1    0   2001    n/a
chr1    2001    4002    n/a
chr1    4002    6003    n/a
chr1    6003    8004    n/a
chr1    8004    10005   0.4000
chr1    10005   12006   0.5947
chr1    12006   14007   0.5882
chr2    0   1999    n/a
chr2    1999    3998    n/a
chr2    3998    5997    n/a
chr2    5997    7996    n/a
chr2    7996    9995    n/a
chr2    9995    11994   0.6720
chr2    11994   13993   0.3722
chr2    13993   15992   0.4132

Does anyone have any ideas on how to add transcript names which correspond to the genomic ranges? Further, is there a way to highlight regions which fall into a cpg site? I'd be interested in removing these.

Any advice would be appreciated.

genomics annotations cpg BED • 1.5k views

ADD COMMENT • link 2.1 years ago by K.patel5 ▴ 150

1

Entering edit mode

Hi @K.patel5 you don't need mysql to run ngs-bits =) mysql is an optional thing. you can even use ngs-bits in a container via bioconda.

cpg-sites are from 2bp long (CG is already a CpG site) and your regions are 1KB so it is not clear what do you want to remove from where.

ADD REPLY • link 2.2 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

Thanks, I did end up using bioconda for this. I had intended to remove an areas with a CG ratio above 0.8. I have read in a few CNV diagnostic publications that removal of such regions can improve false reporting of CNVs.

ADD REPLY • link 2.1 years ago by K.patel5 ▴ 150

score 2 · Accepted Answer · 2022-04-05

2

Entering edit mode

2.6 years ago

Matthias Zepper 5.0k

I think bedtools is your friend here. Obtain bedfiles with transcripts and CpG coordinates e.g. from the UCSC Genome Browser and then have a look at bedtools intersect or closest to annotate and filter.

PS: Have you considered to run your desired tool inside a container? Sometimes this in possible on HPC systems...