I want to use htseq-count http://www-huber.embl.de/users/anders/HTSeq/doc/count.html to get gene counts (RPKM) to analyze DE genes in a RNAseq experiment. I'm NOT interested in alternative splicing, only in getting RPKM values for downstream analyses with DEGseq, an R package. HTseq-count requires me a GFF file, but I only have my reference.fasta. Is there any way I can use the fasta file or convert it to GFF?
The only gene information is the gene ID..this is all I need, I'm not looking for other features such as exons... I'm working with sugarcane, so there is no reference genome, I use as reference SAS (sugarcane assembled sequences) or the sorghum genemodels
Each file type was invented to represent certain type of information. The fasta file was meant to store sequences, a GFF file was meant to represents genomic features (intervals). In general there is no way to directly convert between the two.
As daler above points out, if your fasta file happens to store each gene separately and also lists extra information about the coordinates then we could give you a parser that generates a GFF from it (post the header). Another option if you knew the gene sequences you could align these to the genome and thus creating your own annotations.
I do know the gene sequences and I've already aligned with novoalign, what I want to do now is to get the expression values in RPKM. I can use the uniquely mapped genes as input in DEGseq, but I'd rather use RPKM ... Should I use cufflinks instead of HTseq count to get the RPKM values, so there is no need for a GFF file?
1) Is there any gene information the fasta headers (please post an example if so) 2) What genome are you working with?
The only gene information is the gene ID..this is all I need, I'm not looking for other features such as exons... I'm working with sugarcane, so there is no reference genome, I use as reference SAS (sugarcane assembled sequences) or the sorghum genemodels
see my recipe here. since it sounds like you are using a transcript fasta file, the concept is the same Deg Analysis On 2 Mirna Library