We have paired-end Illumina RNASeq reads and we are working with a non-model organism with no reference genome. We have a working composite for a protein sequence that includes every exon we have found via cDNA. We have 6 muscle types with some triplicates and want to see how many times 4 specific exons that look to be alternatively spliced are present in each muscle type.
For example, muscle type a has this exon expressed 46% while muscle type b only expresses this exon 12% of the time.
I'm not looking for differential expression, only a number of how many times this exon is found within the muscle type's transcript file.
I've tired feeding HISAT2 BAM files into stringtie and also taking the GTF files from stringtie and putting them into htseq-count but neither worked.
I was already able to align the raw reads to the composite and visualize the alignment in IGV. However, there are thousands of raw reads aligning to the 4 exons of internet. So I was hoping that there would be a better way of quantifying the frequency than manually counting.
Do I have to annotate the composite so that it is easier to select what I am looking for and if so how do I do that.
I apologize for focusing on something other than the question, but did you post the same question (with slightly different wording) under different accounts?
Get an abundance/frequency of how many times within an RNASeq file a transcript maps to an exon
Both mention "For example, muscle type a has this exon expressed 46% while muscle type b only expresses this exon 12% of the time."
I very much want to encourage use of Biostars, but I think it is kind of important to have a transparent account, ideally linked to other information about yourself (such as your actual name, photo, etc.). Otherwise, it is harder to keep track of the answers in the different posts, and I think seeing the overall learning process for a project is important for the broader community.
There is another person in the lab working with the same samples and that was her account. She is focusing more on the bioinformatic aspect and so I asked her to post the question originally. When I found out the sign up was free, then I posted the question. I apologize for any confusion I might have caused.
That's OK - there are frequently similarly worded questions coming from different users. However, they usually aren't this close to being identical, and usually are posted on different days :)
Do you just have the exon sequences or do you have approximate transcript isoform sequences? The latter will be easier to use going forward.
I have exon sequences yes.