Hey all,
I have RNA-seq data from an experiment done with no replicates. The lab I'm with would like me to perform differential expression analysis with this data. I realize that this is not ideal, but I'm trying to make the best out of a bad situation. I've already told the lab that any output I produce isn't fit for publication, and really can only be used within the lab to base future projects on. From what I've read online, it seems like my best option is to put my read counts through GFOLD, and base everything off of the GFOLD values. The issues is that I am quantifying my reads using alignment free quantification using Kallisot, thus I don't have a SAM/BED file to use as input with gfold count.
Without using gfold count, I wish to make my own input for gfold diff, but I don't know which values are required for accurate calculation of GFOLD values. gfold diff normally takes in a tab delimited csv file with five columns: "GeneSymbol", "GeneName", "Read Count", "Gene exon length", and "RPKM". I have gene level counts from putting Kallisto's read counts through tximport. I can easily create a text file of the correct format which includes both the "GeneSymbol" and "Read Count" columns while leaving the others as NA; furthermore, this input will produce output with GFOLD values. However, I'm currently unsure if I can trust these GFOLD values as being accurate having left the "Gene exon length" and "RPKM" columns empty. Are these other colums required for accurate GFOLD value calculation?