I have a targeted sequencing dataset. the company provided bed file that is based on hg19. I aligned the data to hg38 (Want to be consistent in the manuscript as all other datsets from a different targeted panel were aligned to hg38) and used liftover to turn the hg19 bed file to hg38 using https://genome.ucsc.edu/cgi-bin/hgLiftOver. Then I use bedtool intersect to filter the vcf file using the bed file output from liftover.
bedtools intersect -wa -wa -a bed.file -b vcf.file
The problem is after intersect there are some genes that are not in the targeted gene list. Below is an example. Both of these genes are not targeted by the panel.
Chr hg38Start hg38End hg19_pos Chr SNP Gene coverage
chr4 139698109 139698229 chr4:140619264-140619383 chr4 139698118 H3P16 3048
chr4 1806596 1807747 chr4:1808324-1809474 chr4 1806974 LETM1 696
looks like chr4:1808324-1809474 in hg19 correspond to FGFR3 but liftover to hg38 became LETM1? Could you please tell me what I have done wrong please?
Thank you.
I looked at the bed coordinates again and it seems like the liftover converted it correctly (i compared it in IGV). chr4 1806596 in hg38 is FGFR3 and chr4:1808324-1809474 in hg19 is also FGFR3. Maybe Snpeff is annotating it wrong? I used the hg38 genome provided by GATK and the command line I used for snpeff was