Hi Salmon users or developers! In salmon paper, to evaluate the ground truth, it uses RSEM to a certain data and uses polyester to the output of RSEM.
Why use polyester on the output of RSEM?
In salmon github repo, can somebody direct me to the script that uses polyester with all its different parameter? Polyetser is a R package but I don’t see any R in salmon github repo. I want to see how salmon uses polyester.
Thanks Rob! One question is I don’t see hexamer bias in the code or polyester documentation or maybe I missed something? When you were using polyester, do you recall adding hexamer bias, if not, how much hexamer bias is crucial in rna seq data? To me hexamer bias is affected by gc bias, so if you already have gc bias setup in polyester, it’s ok.
In alpine and Salmon papers we only simulated the fragment GC bias. We isolated the fragment GC bias from the hexamer bias using alpine, as mentioned here:
https://www.nature.com/articles/nbt.3682#Sec2
I don't think Polyester has functionality for hexamer / read start bias, but in our analyses in the alpine paper, this was not a contributing factor to across-sample variation in the studies we looked at, including the IVT-seq, GEUVADIS, ABRF, or SEQC. The driving factor that lead to mis-estimation of isoform abundance (e.g. the cases of mis-identification of dominant isoform) was driven by fragment-level biases arising most likely from PCR amplification steps varying across library preparation batches.