Hi Guys,
I wanted to know if any utility exist to map the gff file coordinates to map on the individual genes. I mean, if gff file has coordinates for full genome, I want to convert these to coordinates for individual gene.
Thanks,
RT
Hi Guys,
I wanted to know if any utility exist to map the gff file coordinates to map on the individual genes. I mean, if gff file has coordinates for full genome, I want to convert these to coordinates for individual gene.
Thanks,
RT
That would be a a very simple tool: feature.coord - (gene.min - 1) gives you feature.coords relative to gene.min position.
Replace the landmark column with the gene id and you get a valid gff entry again.
First, create a set of mappable genes (e.g., assuming human):
$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_21/gencode.v21.annotation.gff3.gz \ | gunzip --stdout - \ | awk '$3=="gene"' - \ | grep -f genes.txt - \ | convert2bed -i gff - \ > genes.bed
Then map your GFF file (my_annotations.gff
) to these genes:
$ bedmap --echo --echo-map --delim '\t' <(convert2bed -i gff < my_annotations.gff) genes.bed > answer.bed
The answer will contain your GFF annotation, and all genes (and their coordinates) which overlap your annotation by one or more bases.
Hi Alex, This is helpful but does not completely serve my purpose. For example if the gene is at position 1619..2809 in the genome. In my file I want to convert these coordinates in reference to individual gene which will be 1..1181 and similar for all the features of each gene. Do you know any tool/utility for this?
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi Michael, Can you please tell the name of the tool? I can write my own python script to do this, just wanted to know if any tool exist before I write my own. Thanks.