I have a fasta file of the genome and an gff file (CDS features only) of a non-model organism, Bicyclus anynana. I need to extract the 3'UTR regions for each gene registered in the gff file, is there any program I can use?
I have a fasta file of the genome and an gff file (CDS features only) of a non-model organism, Bicyclus anynana. I need to extract the 3'UTR regions for each gene registered in the gff file, is there any program I can use?
To determine the UTR regions of a gene you will need to align the transcript data to the genome, link it to the CDS coordinates you have, and then the part that is covered by transcript and is not within CDS is UTR.
There is no way that you could located them without transcript evidence (there have been some attempts to predict them 'in silico' but success rate of those approaches is really low)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
It would help if you post
head
of the fasta and gff files and also mention the organism name. Anyways, you can use bedtools getfasta to extract regions from a fasta file using coordinates specified in bed/gff format.Thanks Ashish, my gff file only contains coordinates of CDS regions. So I do not have any coordinates for 3'UTR in the gff file. What I want is to identify and extract the 3'UTR region from the fasta file based on the coordinates of the genes (CDSs) from the gff file.
Use bedtools complement It will give you all the regions not represented in your gff file. You can then use it extract all the "non-CDS" regions from your genome but to identify UTRs in those regions will take more downstream steps. How did you end up with a gff file containing only CDS coordinates?