PCA for Batch effect
0
0
Entering edit mode
25 days ago
SIMONE • 0

Dear all,

i am working on 500 WES samples coming from two different sources and i would like to check if there is any batch effect. I ma trying to perform PCA because i think it is the easiest way to check that but i don't know which data i should use. Should i use quality data from fatstqc or bam file or should i use data from vcf file? There is an R package or something i can use ? I have used VarScan2 to call SNPs.

Thank in advance.

Batch WES PCA effect • 365 views
ADD COMMENT
0
Entering edit mode

You should convert VCF data to genotypes (0/1/2) for each SNV. Then remove sites where all samples are same, then feed that data into princomp(). When making the PCA, you can colour the points by the batch. Don't just look at the top 2 PCAs. Use a screeplot to see how many PCs you need to encapsulate 90-95% of the variation among the samples. It may be that you need to look at 5-10 PCs. Unfortunately I am not aware of a package that can do all of these steps.

ADD REPLY
0
Entering edit mode

Hi,

Thanks for the comment, i have another question. Would it make sense to check some quality metrics obtained through CollectHsMetrics or CollectAlignmentSummaryMetrics?

Best.

ADD REPLY
0
Entering edit mode

Collecting QC information is always a good idea. Fastqc can tell you some things about the sequencing run and library, but others can only be obtained from the BAM files like %unique reads and %reads mapped to the genome.

ADD REPLY

Login before adding your answer.

Traffic: 1651 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6