Hi,
I recently analyzed some targeted exome sequencing
samples, which were provided to us by our collaborators, for which I do not possess the target gene list
. Upon analysis, I am informed that some of the genes - whose variants were identified - were not present in the target gene list
. Has anyone ever faced such an issue, or have any idea why I might be observing these variants?
If it helps, I had found duplicate entries (both name and sequence) in some raw fastq files, so I had removed them using seqkit rmdup
. Since I don't know whether all variants in untargeted genes exist exclusively in these files, I can't even be sure that removing the duplicate entries could be causing an issue with the alignment and/or variant calling.
The pipeline used was - fastqc
--> trim_galore
(while preserving only the paired reads, and not singular reads) --> seqkit rmdup -n
(to remove duplicate entries based on name) --> bwa_mem
using hg38 as the reference --> picard
to sort_sam
, mark and remove PCR duplicates --> variant calling with GATK4
--> BQSR with GATK4
--> applying BQSR on bam file with GATK4
--> variant calling from the recalibrated bam file created in the previous step using GATK4
--> annotation using hg38 as a reference with wANNOVAR
Thanks in advance.