I am trying to test a software for abundance estimation, and I am trying to think of a way to generate my own set of reads, but knowing the expected values of benchmark relative abundances in advance to make sure I can compare the output to the benchmark. If I have a set of N transcripts, and I generate M reads from these transcripts knowing the origin of each read, can I, using that information estimate expected RPKM or TPM and how? Would TPM for a specific transcript just be num_reads_from_it / num_reads_overall * 10^6
?
https://haroldpimentel.wordpress.com/2014/05/08/what-the-fpkm-a-review-rna-seq-expression-units/