Hello biostars. I recently ran rsem on my RNA-seq data and came back with unusual isoform results. In the IsoPct (isoform percentage) column I only have 100 or 0, meaning that there was no identifiable gene/isoform or that there was only one isoform of the gene. However, I find this highly unlikely and I believe that this is due to the fact that isoforms were not directly annotated in my reference genome or actual data. How would I go about finding the isoforms for my genes using fasta files (that have individual genes not divided by chromosome) and bam files for my varying conditions? I do not have isoforms annotated so this will have to be de novo.
I have tried various programs available such as flipflop which requires sam files, I have bam files that are much too big to convert (>10 gb). Also, I have tried GESS which requires fasta files for each chromosome in the reference genome (I only have a reference genome with all of the individual genes not divided by chromosomes). I used hisat and HTseq to retrieve my bam files and gene counts.
Much appreciated.
Was this a de novo assembly?
Not entirely, we had a reference genome (Taejoon lab) for assembly and for identifying exons, CDS, etc. However, isoforms weren't included and we would like to find a way to identify them.