I have results of targeted sequencing. It is several csv files (tables) with information about the identification number of amplicon (in two samples there are equal number of amplicons - 202) and amplicon coverage (number of reads which "cover" amplicon). It looks like that (*image below). And I have to compare two coverage profiles of two samples. There are several methods to do that (Pearson correlation, Euclidean distance, Chi-square, t-test, PCA, clustering analysis and so on...), but I don't quite sure what will be statistically correct?
*important moment: inside sample there is kind of competition for reagents, so if one amplicon will be "covered" by reads more then others will be "covered" less.
link to image: https://drive.google.com/file/d/1o78wnJgVKs3QEUEZbSD4oTpdr7k5_zCw/view?usp=sharing
What is the goal of your experiment? You must have a plan before you start sequencing. Usually targeted sequencing is to find mutations.
The goal is to validate the result. I have two samples (control and new) and I want to know that the coverage profile of new sample is the same with control sample.
But why do you use targeted seq? For mutation analysis or just to see how many amplicons you get? If you have designed this panel for mutation analysis, your validation should be that you can find the mutation in both samples.
It is for mutation analysis. This gene panel search for mutations in about 194 genes which can cause Mendelian disorders. But.. my aim is to develop some analysis tool which will compare coverage profiles, since differences in them between control and new sample can tell you tell about problems with sample preparation. So if something is wrong I want to see that.
Decide for yourself the threshold of depth of coverage for each amplicon. If a sample contains an amplicon with less than that depth of coverage "something is wrong" I would say. Maybe make a barplot of each new sample, each amplicon a bar with the depth of coverage. Just an idea.