Hi! everyone:
I'm a beginner in bioinformatics and I can't think things comprehensively due to too little experience, Here, I got a question doubts me a lot:
- When you analyze the data labeled as WT/KO, you go to the standard workflow. But how do you know the data is truly labeled? (maybe someone mistook the sample or you made some mistakes during rename ) .
- Even more, how to detect your data is true ChIP-seq/RNA-seq... data?
- or within the ChIP-seq data, how do you know it's really H3K4me1/H3K27me3?H3K27Ac, etc?
At present, I have the thoughts below (respect to the question number):
- Using your replicates or download similar data to do PCA or clustering.
- Check the reads coverage along the genome (This question may be a little naive).
- There are some published profiles about typical marks, we can make a comparison.
But as we know, things may be worse. So, can you think more ideas or if you have done similar checks, can you share your experience?
Many thanks for your attention and suggestions!
Here is another related discussion: Estimating cross contamination in a set of BAMS