Hard filtering to reduce false positive variants due to poly track sequences
1
0
Entering edit mode
3 months ago
ManuelDB ▴ 110

In a old pipeline in my lab I have found this

 #Annotate with low complexity region length using mdust
/share/apps/bcftools-distros/bcftools-1.3.1/bcftools annotate \
-a /state/partition1/db/human/gatk/2.8/b37/human_g1k_v37.mdust.v34.lpad1.bed.gz \
-c CHROM,FROM,TO,LCRLen \
-h <(echo '##INFO=<ID=LCRLen,Number=1,Type=Integer,Description="Overlapping mdust low complexity region length (mask cutoff: 34)">') \
-o "$seqId"_"$sampleId"_lcr.vcf \
"$seqId"_"$sampleId"_left_aligned_annotated.vcf

The link with the bed file is a old database GATK used to have to collect resources. Now, this is in here https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0/ but I cannot find the equivalent for hg38.

This needs a licence https://genome.ucsc.edu/cgi-bin/hgTrackUi?g=rmsk ;(

GATK • 324 views
ADD COMMENT
2
Entering edit mode
3 months ago
GenoMax 147k

mdust low complexity region length

mdust no longer appears to be available but you could instead use sdust (https://github.com/lh3/sdust ) which is Heng Li's implementation. You will need to make up the BED file though.

ADD COMMENT
0
Entering edit mode

That is exactly what I was looking for. Thank you so much :)

ADD REPLY

Login before adding your answer.

Traffic: 1545 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6