Dear All,
I was looking for some tool or script that can hep me to find the intergenic regions between the genes.
I have sequences and coordinates for the gene cluster.
Any help or suggestions are welcome.
Thanks!
Dear All,
I was looking for some tool or script that can hep me to find the intergenic regions between the genes.
I have sequences and coordinates for the gene cluster.
Any help or suggestions are welcome.
Thanks!
Use Bedtools subtractBed : http://code.google.com/p/bedtools/wiki/Usage#subtractBed
Yes, my bad. I read the command alone and thought that it, given the coordinates of genes AND exons, it gets me the intron coordinates. I have a gff annotation where this is the case (no intron coordinates). I guess I overlooked. This would require a data structure like interval trees (as in IRanges), I'd suppose..?
Yes, my bad. I read the command alone and thought that it, given the coordinates of genes AND exons, it gets me the intron coordinates. I have a gff annotation where this is the case. I guess I overlooked. This would require a data structure like interval trees to find overlapping regions (as in IRanges), I'd suppose..?
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I assume your are not able to program, because this is a very easy problem. But if you want us to program it, we should have at least the structure of the layout of the coordinates and the format of the file containing the sequence.
Fabian,
I am also looking for a code logic to extract intergenic sequences based on the coordinates of the genes. But am stuck with overlapping complications. Could you please share code logic to address the case given below.
Gene Coordinates and Gene Details - Name and Strand
Start - Stop GeneName Strand 10 - 19 Gene_1 + 27 - 46 Gene_2 + 27 - 89 Gene_3 - 110 - 250 Gene_4 + 120 - 340 Gene_5 + 180 - 350 Gene_6 - 260 - 397 Gene_7 - 425 - 625 Gene_8 + 680 - 2 Gene_9 -
Ideally this is the output I am expecting
IGNo Start - End DistalGeneName ProximalGeneName DistalGeneStrand ProximalGeneStrand IG1 3 - 9 Gene_9 - Gene_1 + (Comparison with the last start and stop positions to get the actual IG coordinates) IG2 20 - 26 Gene_1 + Gene_3 - (In case of genes with same start coordinates the longer gene would be the proximal gene) IG3 90 - 109 Gene_3 - Gene_4 + IG4 398 - 424 Gene_7 - Gene_8 + (Here is the difficulty, how to skip the intermediate overlapping genes) IG5 626 - 679 Gene_8 + Gene_9 -
The overlaps in some case can be many, having difficulty to address that in logic.
If you can share a code that can resolve this or explain the logic that I can use, it would be awesome and I would be very thankful to you.
I assume your are not able to program, because this is a very easy problem. But if you want us to program it, we should have know at least the structure of the layout of the coordinates and the format of the file containing the sequence.