How can find a list of genes are neighbour?
3
1
Entering edit mode
9.0 years ago
m.koohi.m ▴ 120

Suppose I have Protein1, Protein2, Protein3 that are expressed by Gene1, Gene2, Gene3. I want to know Gene1, Gene2, Gene3 in the genome sequence are neighbor or not?

Genome: ========Gene1=====Gene2=======Gene3======

proteins genes sequence • 4.7k views
ADD COMMENT
1
Entering edit mode

what do you mean with "neighbour"?

ADD REPLY
0
Entering edit mode

Suppose I have Protein1, Protein2, Protein3 that are expressed by Gene1, Gene2, Gene3. I want to know Gene1, Gene2, Gene3 in the genome sequence are neighbor or not?

Genome: ========Gene1=====Gene2=======Gene3======
ADD REPLY
2
Entering edit mode
9.0 years ago
yuhao819 ▴ 20

I suggest you to sort your gff file according to scaffold (chromesome) firstly, and coordinate secondly. they sit at the first column and fourth column, repectively.

REF: http://asia.ensembl.org/info/website/upload/gff.html?redirect=no

ADD COMMENT
4
Entering edit mode
9.0 years ago

You can do it in R:

> biocLite('Homo.sapiens')
> library(Homo.sapiens)
> human.genes = genes(TxDb.Hsapiens.UCSC.hg19.knownGene)
> data.frame(
   gene = human.genes$gene_id, 
   neigh.upstream = human.genes$gene_id[precede(human.genes, human.genes)], 
   neigh.downstream=human.genes$gene_id[follow(human.genes, human.genes)]
   )
       gene neigh.d neigh2.u
1         1  79673 116412
2        10  63898      9
3       100  10955  79015
4      1000  83539   1825
5     10000   9859    159
6 100008586 645037  26749

# Let's verify the first gene:
> subset(human.genes, gene_id %in% c(1, 79673, 162968))
GRanges object with 3 ranges and 1 metadata column:
         seqnames               ranges strand |     gene_id
            <Rle>            <IRanges>  <Rle> | <character>
       1    chr19 [58858172, 58874214]      - |           1
  162968    chr19 [58865723, 58874214]      - |      162968
   79673    chr19 [58637619, 58666477]      - |       79673

This will give you a dataframe with the upstream and downstream neighbor of each human gene. You can simply intersect with your list to identify if two genes are neighbors.

Note that the follow() and precede() functions ignore overlapping genes. If you want to reduce the problem, you can use the resize() function to create two new GRanges objects, one with only the start/end of the gene, and calculate the neighbors from there:

> human.genes.start = resize(human.genes, 1, fix='start')
> human.genes.end = resize(human.genes, 1, fix='end')
> data.frame(
   gene = human.genes$gene_id, 
   neigh1 = human.genes$gene_id[precede(human.genes.start, human.genes)], 
   neigh2=human.genes$gene_id[follow(human.genes.end, human.genes)]
   )

Have a look at the nearest(), distanceToNearest() functions as well.

EDIT: I have been thinking about the problem of overlapping genes during these days. I think the easiest way to handle them is to reduce() the gene coordinates first, merging all overlapping genes in one unit:

> human.genes.reduced = reduce(human.genes, with.revmap=T)
GRanges object with 20884 ranges and 1 metadata column:
                seqnames           ranges strand   |         revmap
                   <Rle>        <IRanges>  <Rle>   |  <IntegerList>
      [1]           chr1 [ 11874,  14409]      +   |            581
      [2]           chr1 [ 69091,  70008]      +   |          19514
      [3]           chr1 [762971, 794826]      +   |          17162
      [4]           chr1 [860530, 879961]      +   |           4936
      [5]           chr1 [895967, 901099]      +   |          10174
      ...            ...              ...    ... ...            ...
  [20880] chrUn_gl000213 [108007, 139339]      -   |            652
  [20881] chrUn_gl000218 [ 38786,  97454]      -   |      546,11344
  [20882] chrUn_gl000219 [ 53809,  99642]      -   |           8941
  [20883] chrUn_gl000220 [ 97129, 126696]      +   |           1400
  [20884] chrUn_gl000228 [ 79448, 124171]      +   | 18730,1868,642
  -------

The with.revmap option allows you to keep the original gene ids. In the example, you can see that the genes 18730,1868,642 are overlapping, and therefore have been merged to a single unit.

ADD COMMENT
0
Entering edit mode

Hi. I am a Ph.D. student studying biomedicine. I do not know how to identify the nearby (neighboring) genes of one gene of interest. I just have a little background of R. With the above code, if I want to know the nearby genes of a specific gene (e.g. BAIAP2-DT, Gene ID: 440465), can you please show me how I should use your R codes? Thanks in advance.

ADD REPLY
1
Entering edit mode
9.0 years ago
Alternative ▴ 290

This depends on you definition of neighbors. From a bed file with gene coordinates, you can use "Bedtools closest" or "Bedtools window" in order to look around each feature and detect neighboring features. Many other tools allow you to do such things like "BEDOPS".

ADD COMMENT

Login before adding your answer.

Traffic: 2020 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6