Entering edit mode
2.7 years ago
arya.sagittarius
▴
10
The following are the steps I'm following:
First step to extract sample using bed file is this (here the bedfile is input bedfile converted to Hg38):
tabix -h -R Hg19_to_Hg38_sorted.bed.gz gnomad.genomes.v{g_version}.hgdp_tgp.chr{chr}.vcf.bgz | perl {vcftools} -c {sample_name} > {sample_name}_out.vcf'
output({sample_name}_out.vcf')
chr2 113982416 rs56177103 TATAAAATAAAATAAA T . PASS . GT:AAD:DAD:DAF:ADF 0/1:25519,4077:25519,4077:0.13776:0.13776
chr2 113982416 rs56177103 TATAAAATAAAATAAA T . PASS . GT:AAD:DAD:DAF:ADF 0/1:25519,4077:25519,4077:0.13776:0.13776
chr2 113982416 rs56177103 TATAAAATAAAATAAA T . PASS . GT:AAD:DAD:DAF:ADF 0/1:25519,4077:25519,4077:0.13776:0.13776
As my output file had repeated regions, to extract the unique regions I'm using the same input bed file with intersect bed, but I'm unable to get the unique reads. It gives the same repeated results. why is that so? The following is the cmd that I had used:
bedtools/intersectBed -u -a {sample_name}_out.vcf' -b bed_filename > output.vcf
Was also wondering if doing sort|uniq gives the same result?