Entering edit mode
6.2 years ago
1234anjalianjali1234
▴
50
Hellow,
I am doing gene wise analysis of intron and exon distribution, for this i have to convert my gff file information (which is based on chromosome) according to genes.
I have 1 base coordinate file but Gene Structure Display Server
input file should contain 0-based bed file.
##gff-version
##sequence-region ch00 1 20852292
##sequence-region ch01 1 98455869
##sequence-region ch02 1 55977580
##sequence-region ch03 1 72290146
##sequence-region ch04 1 66557038
ch01 15504164 15510115 Gene1 gene
ch01 15504164 15510115 Gene1 mRNA
ch01 15504164 15504532 Gene1 three_prime_UTR
ch01 15504164 15505513 Gene1 exon
ch01 15504533 15505513 Gene1 CDS
ch01 15505592 15506740 Gene1 exon
ch01 15505592 15506740 Gene1 CDS
ch01 15507158 15507454 Gene1 exon
ch01 15507158 15507454 Gene1 CDS
ch01 15507599 15508670 Gene1 exon
ch01 15507599 15508670 Gene1 CDS
ch01 15509634 15510115 Gene1 exon
ch01 15509634 15510115 Gene1 CDS
ch01 72699527 72702973 Gene2 gene
ch01 72699527 72702973 Gene2 mRNA
ch01 72699527 72699756 Gene2 exon
ch01 72699527 72699756 Gene2 CDS
ch01 72699765 72699869 Gene2 exon
ch01 72699765 72699869 Gene2 CDS
ch01 72699915 72700248 Gene2 exon
ch01 72699915 72700248 Gene2 CDS
ch01 72700436 72700771 Gene2 exon
ch01 72700436 72700771 Gene2 CDS
ch01 72701150 72702213 Gene2 exon
ch01 72701150 72702213 Gene2 CDS
ch01 72702472 72702973 Gene2 exon
ch01 72702472 72702973 Gene2 CDS
ch04 54476287 54481244 Gene3 gene
ch04 54476287 54481244 Gene3 mRNA
ch04 54476287 54477248 Gene3 three_prime_UTR
ch04 54476287 54477746 Gene3 exon
ch04 54477249 54477746 Gene3 CDS
ch04 54477873 54479243 Gene3 exon
ch04 54477873 54479243 Gene3 CDS
ch04 54479340 54480432 Gene3 exon
ch04 54479340 54480432 Gene3 CDS
ch04 54480535 54481037 Gene3 CDS
ch04 54480535 54481244 Gene3 exon
ch04 54481038 54481244 Gene3 five_prime_UTR
I want to treat each gene individually i.e. for Gene1:
ch01 0 5951 Gene1 gene
and so on for cds and exons.
Is there any way to do it?
Thankyou
What is the input file format using for conversion? Is it
.gff
or.bed
(1-based). [Clarification is needed because you have pasted bed file as an example.]If it is
.gff
you can do it usinggff2bed
program implemented in BEDOPSi have gff file, but my problem is not converting .
gff
into .bed
. my problem is to treat each gene individually apart from its position on chromosome, i want to start each gene with 0 and end with its corresponding value using gff file. For example, if my gene1 is on chromosome 4 and it starts from 568912 and ends at 569812, what i want is to start it from 0 and ends at 900 and do the same for its exon/CDS/intron respectively. I do not care about its chromosome position right know. I mentioned earlier that i want to useGene Structure Display Server
for my exon/intron visualization.I hope you will get my point. Thankyou
Thank you for the clarification 1234anjalianjali1234.
Can you please post sample data from your real
gff
file, because it would be easy to write or test the code. OR do you want to work on the data which you have posted in the question?Thankyou Nitin, I want to work on this data only, which I have extracted according to my genes of interest from my original gff file.
as per your request my sample gff is:
Sorry for the late, I have written simple python code which may help you. I have tested the code on the data which you have commented above (
.gff
format).This code will take
.gff
file as command line argument and convert the same into zero-based.bed
format (as you were expecting).Thank you for your reply Nitin,
I tried this code but its giving me the output without the gene id.
Just replace,
This line in the code by this,
I thought gene names are not required for the
Gene Structure Display Server
so I haven't included the same in the output.OR you can just append
+ "\t" + key + "\n"
these contents tofwrite.write()
statement.Thakyou nitin, it worked.