Question

bedtools -u not giving unique files

0

Entering edit mode

2.7 years ago

arya.sagittarius ▴ 10

The following are the steps I'm following:

First step to extract sample using bed file is this (here the bedfile is input bedfile converted to Hg38):

tabix -h -R Hg19_to_Hg38_sorted.bed.gz gnomad.genomes.v{g_version}.hgdp_tgp.chr{chr}.vcf.bgz | perl {vcftools} -c {sample_name} > {sample_name}_out.vcf'

output({sample_name}_out.vcf')
chr2    113982416   rs56177103  TATAAAATAAAATAAA    T   .   PASS    .   GT:AAD:DAD:DAF:ADF  0/1:25519,4077:25519,4077:0.13776:0.13776   
chr2    113982416   rs56177103  TATAAAATAAAATAAA    T   .   PASS    .   GT:AAD:DAD:DAF:ADF  0/1:25519,4077:25519,4077:0.13776:0.13776   
chr2    113982416   rs56177103  TATAAAATAAAATAAA    T   .   PASS    .   GT:AAD:DAD:DAF:ADF  0/1:25519,4077:25519,4077:0.13776:0.13776

As my output file had repeated regions, to extract the unique regions I'm using the same input bed file with intersect bed, but I'm unable to get the unique reads. It gives the same repeated results. why is that so? The following is the cmd that I had used:

bedtools/intersectBed -u -a  {sample_name}_out.vcf' -b bed_filename > output.vcf

bedtools intersectbed vcftools vcf tabix • 914 views

ADD COMMENT • link updated 17 months ago by Ram 44k • written 2.7 years ago by arya.sagittarius ▴ 10

0

Entering edit mode

Was also wondering if doing sort|uniq gives the same result?

ADD REPLY • link 2.7 years ago by arya.sagittarius ▴ 10

score 0 · Answer 1 · 2022-03-11

0

Entering edit mode

2.7 years ago

Alex Reynolds 36k

Another option is to pipe BED data to sort-bed:

$ ... | sort-bed --unique - > answer.bed

Ref.: https://bedops.readthedocs.io/en/latest/content/reference/file-management/sorting/sort-bed.html