Expanding bed format into single nucleotide resolution
1
0
Entering edit mode
6.2 years ago
Kasthuri ▴ 300

I have a TCGA CNV data:

chrom   loc.start   loc.end   num.mark   seg.mean
    1   3218610     3586250   166        0.1112

which I would like to expand as:

chrom   loc.start   seg.mean
1       3218610     0.1112
1       3218611     0.1112
1       3218612     0.1112
...
...
1       3586250     0.1112

I can easily write a code to do this, but I am afraid I will miss the actual contig numbering. Any tools out there that can do this considering the genomic coordinate information? Thanks!

genome • 937 views
ADD COMMENT
1
Entering edit mode
6.2 years ago

With BEDOPS bedops --chop and bedmap --echo-map-score, assuming that intervals in your cnv.txt file use a 1-based index and do not overlap:

$ tail -n+2 cnv.txt | sort-bed - | awk '{ $2-=1; print $0; }' > cnv.bed
$ bedops --chop 1 cnv.bed | bedmap --echo --echo-map-score --delim '\t' - cnv.bed | cut -f1,3,4 | cat <(echo -e 'chrom\tloc.start\tseg.mean') - > answer.txt

The file answer.txt will be formatted in a manner similar to the sample output in your question.

If your input intervals overlap or use a different index scheme, please follow up in a comment and I'll suggest a way to work with that case.

ADD COMMENT
0
Entering edit mode

Awesome! Thank you very much.

ADD REPLY

Login before adding your answer.

Traffic: 2494 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6