How to convert my files(cnv seg, refseq) to .bed format ?
1
0
Entering edit mode
7.8 years ago
taegyunlee • 0

Hi

I had downloaded TCGA CNV level 3 data(nocnv, hg19). I hope to map this CNV data to each genes. So, I had searched information about this issue and I could find some.

I got recommendation, using the bedtools.

I had downloaded refseq file from UCSC table browser. refseq file content is as follows.

bin  name       chrom  strand  txStart    txEnd      cdsStart   cdsEnd     exonCount  exonStarts
3    NR_130130  chr1   +       150980866  151008189  151008189  151008189  4          150980866,150997990,150999708,151006281,
3    NR_130132  chr1   +       150980866  151008189  151008189  151008189  4          150980866,150990287,150999708,151006281,

and cnv seg file's content is as follows.

Sample                                                 Chromosome  Start      End        Num_Probes  Segment_Mean
BREAD_p_TCGAb_430_431_NSP_GenomeWideSNP_6_D11_1538030  1           3218610    247813706  128998      0.0014
BREAD_p_TCGAb_430_431_NSP_GenomeWideSNP_6_D11_1538030  2           484222     207696262  110158      0.0067
BREAD_p_TCGAb_430_431_NSP_GenomeWideSNP_6_D11_1538030  2           207696273  207701151  2           -1.5215

As far as I know, I have to convert to my files(cnv seg, refseq) to .bed format. But I don't know how to deal it. What should I do?

Can you give me a hand?

seg cnv bed • 3.7k views
ADD COMMENT
0
Entering edit mode

I am trying the same thing. But having diffculty in converting the seg.txt to a proper .bed format and hence the files could not be read at subsequent steps. Can you help me out on how to proceed with this? How have you managed to get the conversion done?

ADD REPLY
0
Entering edit mode
7.8 years ago
Eric T. ★ 2.8k

Are you downloading gene annotations from UCSC? In the table browser, look for the page/option to export a table in another format, i.e. BED. You don't necessarily need to use the files as they are on the FTP site.

The general answer here is that these are all tabular formats, so you can extract the columns you need using standard Unix tools or a short script in R or Python. The format of BED is chromosome/start/end, while the UCSC RefSeq table and and the SEG format both have these columns along with others. So you select the chromosome, start, and end columns from the input format using cut or awk, and subtract 1 from the 'start' position for SEG because SEG uses 1-based indexing while BED and RefSeq use 0-based indexing.

ADD COMMENT

Login before adding your answer.

Traffic: 2822 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6