how to annotate chromosome position with the gene list file
1
0
Entering edit mode
6.9 years ago
mittu1602 ▴ 200

Hello All,

I have chromosome position file which was obtained as a result of a CNV tool, which looks like: (file1)

chr1    11167376        11167465        amp
chr1    11167466        11167565        amp
chr1    11167566        11167598        neutral
chr1    11167599        11167644        del
chr1    11168123        11168180        amp
chr1    11168181        11168209        neutral
chr1    11168210        11168309        amp
chr1    11168310        11168409        amp
chr1    11168417        11168433        amp
chr1    11169248        11169336        amp
chr1    11169337        11169436        amp
chr1    11169437        11169487        amp
chr1    11169489        11169560        amp
chr1    11169562        11169649        amp
chr1    11169670        11169769        amp
chr1    11169770        11169806        amp

and I have a gene list file from UCSC: (file2)

chr1    11159886        11159888        EXOSC10
chr1    11166588        11167557        MTOR
chr1    11167542        11167544        MTOR
chr1    11167545        11167557        MTOR
chr1    11168238        11168343        MTOR
chr1    11168238        11168343        MTOR
chr1    11169347        11169427        MTOR
chr1    11169347        11169427        MTOR
chr1    11169706        11169786        MTOR
chr1    11169706        11169786        MTOR
chr1    11172909        11172974        MTOR
chr1    11172909        11172974        MTOR
chr1    11174375        11174510        MTOR
chr1    11174375        11174510        MTOR

I tried bedtools and bedops but it annotates many intermediate regions, my expected output is:

chr1    11167376    11167465    amp MTOR
chr1    11167466    11167565    amp MTOR
chr1    11167566    11167598    neutral MTOR
chr1    11167599    11167644    del MTOR
chr1    11168123    11168180    amp MTOR
chr1    11168181    11168209    neutral MTOR
chr1    11168210    11168309    amp MTOR
chr1    11168310    11168409    amp MTOR
chr1    11168417    11168433    amp MTOR
chr1    11169248    11169336    amp MTOR
chr1    11169337    11169436    amp MTOR
chr1    11169437    11169487    amp MTOR
chr1    11169489    11169560    amp MTOR
chr1    11169562    11169649    amp MTOR
chr1    11169670    11169769    amp MTOR
chr1    11169770    11169806    amp MTOR

So basically I want to annotate genes from file2 on file1, Thank you

Genomics • 2.1k views
ADD COMMENT
0
Entering edit mode

This looks like a bedtools intersect problem. What command did you use?

ADD REPLY
0
Entering edit mode
bedtools intersect -wa -wb -a file1 -b file2 > result
ADD REPLY
0
Entering edit mode
6.8 years ago
Paul ★ 1.5k

Hi, I am usually using syntax like this:

bedtools intersect -a file1 -b file2 -wao | awk -v OFS="\t" '{if($5 == ".") print $1,$2,$3,$4"_no_intersect"; else print $1,$2,$3,$4"_"$8}' | sort | uniq

Your files need to be tab separate. In your files it looks, that some regions are not intersect.

Output is:

chr1    11167376    11167465    amp_MTOR
chr1    11167466    11167565    amp_MTOR
chr1    11167566    11167598    neutral_no_intersect
chr1    11167599    11167644    del_no_intersect
chr1    11168123    11168180    amp_no_intersect
chr1    11168181    11168209    neutral_no_intersect
chr1    11168210    11168309    amp_MTOR
chr1    11168310    11168409    amp_MTOR
chr1    11168417    11168433    amp_no_intersect
chr1    11169248    11169336    amp_no_intersect
chr1    11169337    11169436    amp_MTOR
chr1    11169437    11169487    amp_no_intersect
chr1    11169489    11169560    amp_no_intersect
chr1    11169562    11169649    amp_no_intersect
chr1    11169670    11169769    amp_MTOR
chr1    11169770    11169806    amp_MTOR

EDIT: Also if you need in output just intersect regions, you can modify AWK script or make a pipe to grep -v "no_intersect"

ADD COMMENT

Login before adding your answer.

Traffic: 2640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6