My R code for creating rpkm from HTSeq and GTF file :
First, you should create a list of gene and their length from GTF file by subtracting (column 5) - (column 4) +1, output Tabdelimited will be like :
Gene1 440
Gene2 1200
Gene3 569
and another file is HTSeq-count output file which made from SAM/BAM and GTF input files
I have used this code:
file_len<- read.delim("gene_length.txt",header=F,sep="\t")
file_count<- read.delim("sample1.counts",header=F,sep="\t")
colnames(file_len)<- c("GeneName","Len")
colnames(file_count)<- c("GeneName","Count")
file_count<-file_count[ !grepl("__", file_count$GeneName) ,]
total_count<- sum(file_count$Count)
oneB<-10^9
finallist <- merge(file_len,file_count,by="GeneName")
finallist$RPKM<-0
finallist[,2:4] <- (sapply(finallist[,2:4], as.double))
finallist$RPKM<- (oneB*finallist$Count)/(total_count*finallist$Len)
#finallist<-finallist[finallist$RPKM>1,]
write.table(finallist,file="rpkm.txt",sep="\t", col.names = T, row.names = F)
Rpkm Calculation For Genes