hi guys,
I generated more than 10000 transcript coverage histogram and i want to separate them into 2 different categories .
group 1, there is a global increase of read number but no local coverage shift
original plot:
group 2, there is a shift of read coverage
the blue and the orange figures are 2 different samples Yes, i know this is something very simple for human-eyes but here if we want to let machine do this the job, i have no idea about how to start it. Anyone of you has some idea about this type of question?
Thanks a lot
Are you trying to classify each transcript individually, with orange being Treatment A and blue being Treatment B? And how do you plan on handling genes with no change in coverage between your two samples? Are they already filtered out?
For the examples above you could think about a 5':3' ratio of the change in read coverage. In the first graph the ratio would be ~1, since there is a consistent change in coverage, whereas in the second graph it would be >>1 since the difference in blue reads is much higher at the 5' end. Maybe take the reads at the first and last 10% of the gene body, or something like that?
thanks,
Yes, for every transcript and non-changed transcripts are already filtered out by deseq2 count. i am thinking about a total scan of all selected transcript, analysis first 10% will lose a lot data,no?
Are you envisioning doing this as an image/general profile comparison? Those profiles are not on the same scale. So while for humans it may be easy to discern a pattern for a machine not so much. You also talk about clustering so you want to do multiple comparison as well?
thank you genomax,
I am open for all algo or method. my goal is to separate those transcript coverage profile but it's something new for me, need guidance