Hello Biostars,
I have a control with another technical replicate then I try to down load a biological replicate to make the statistics more robust. I looked at the raw count and there is big different between the biological replicate and the two technical replicates. Could I do keep that 3 controls for down stream analysis or I need normalization with the biological replicate before adding to the two technical replicate. The PCA bellow has 3 diseased samples with 2 technical replicates each. The control I add even closer with the diseased than the first control. I appreciate your help! Unfortunately, I don't know anyone else near me who can help. They are just busy with their work.
I think you're going in circles. Given that you seem to have little to no local supervision and quite different datasets for an inexperienced bioinformatician why don't you make a post where you describe what data you have, what the overall project goal is (you can describe of course superficially without ever mentioning the exact context for confidential reasons), and where you get stuck. Maybe we can then brainstorm a bit what the steps are to take and which online resources you can read. Right now you shotgun questions but there is a clear roadmap lacking. Lets try the brainstorming things, maybe one can help with that.
Hi @atpoint. Thank you so much for the kind suggestion! Is there any better way that I can learn analyze NGS data to screening to find target transcription factors for genetics diseases? I am happy to pay for you as tuition. I have several projects and data for each is different. The current project has only bulk RNA-seq (1 control with technical replicate vs 3 diseased with 2 technical replicates for each sample) to find target for a diseased with known mutation. If I can find anything potential candidate, then the next step maybe doing bulk ATAC-seq or single cell RNA-seq.
There is supposed to be a bigger difference between biological replicates and technical replicates.
What is the additional replicate you "downloaded" and why would it make sense to combine them with your data? I think a batch effect would be a concern. From my view, the PCA sort of make sense since you have separation between your "control" and "disease", but if the sample was processed differently , then maybe its best to leave it out?
I used 2 control technical replicates, so it is just from one sample, the one I downloaded from another sample so I consider combining them so my control has data from 2 biological samples. The control at the bottom is even closer to disease than the control on the left which makes me concerned. Normally, all RNA-seq data from control and diseased I got in the same folder which I think are sequenced together so adding another new data can have batch effect, so I should not add another control in this case, is that correct?