Extraction of Intergenic Regions and Features from Gene Coordinates and Gene Details
1
0
Entering edit mode
8.0 years ago
SRKR ▴ 180

I am looking for a code logic to extract intergenic sequences based on the coordinates of the genes, and also to assign distal and proximal gene names and strats. But am stuck with overlapping complications. Could you please share code logic to address the case given below.

Gene Coordinates and Gene Details - Name and Strand

  • Start - Stop GeneName Strand
  • 10 - 19 Gene_1 +
  • 27 - 46 Gene_2 +
  • 27 - 89 Gene_3 -
  • 110 - 250 Gene_4 +
  • 120 - 340 Gene_5 +
  • 180 - 350 Gene_6 -
  • 260 - 397 Gene_7 -
  • 425 - 625 Gene_8 +
  • 680 - 2 Gene_9 -

Ideally this is the output I am expecting

  • IGNo Start - End DistalGeneName ProximalGeneName DistalGeneStrand ProximalGeneStrand
  • IG1 3 - 9 Gene_9 - Gene_1 + (Comparison with the last start and stop positions to get the actual IG coordinates)
  • IG2 20 - 26 Gene_1 + Gene_3 - (In case of genes with same start coordinates the longer gene would be the proximal gene)
  • IG3 90 - 109 Gene_3 - Gene_4 +
  • IG4 398 - 424 Gene_7 - Gene_8 + (Here is the difficulty, how to skip the intermediate overlapping genes)
  • IG5 626 - 679 Gene_8 + Gene_9 -

The overlaps in some case can be many, having difficulty to address that in logic.

If you can share a code that can resolve this or explain the logic that I can use, it would be awesome and I would be very thankful to you.

php logic intergenic sequences genes • 2.3k views
ADD COMMENT
0
Entering edit mode

Is this a circular genome? this post might help

ADD REPLY
0
Entering edit mode

Yes considering it as a circular genome. Sorry didn't mention that in question.

ADD REPLY
0
Entering edit mode
8.0 years ago
Jeffin Rockey ★ 1.3k

A better approach would be to use appropriate tools like bedtools, bedops etc.

See complement of bedtools.

You can represent your gene data in a 4 column bed or 6 column bed format.

Then find the length/s of chromosome/s and use that info in the genome file.

Complement of gene regions should be intergenic regions.

I hope the closest and similar tools from bedtools should help you with proximal features.

All the above tools have nice illustrations and examples in their docs page which will give a lot more clarity on what each tool and each argument does.

ADD COMMENT

Login before adding your answer.

Traffic: 1986 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6