How to filter false positive somatic muations in short tandem repeat (STR) called by mutect2 ?
1
0
Entering edit mode
5.5 years ago
maoyonghn • 0

Somatic mutation calling usually uses bwa+GATK-mutect2 in our group. The output vcf file will show some information if a mutation located in short tandem repeat (STR)region, which contaions RPA, RU and STR. The explation as following:

##INFO=<ID=RPA,Number=.,Type=Integer,Description="Number of times tandem repeat unit is repeated, for each allele (in cluding reference)">

##INFO=<ID=RU,Number=1,Type=String,Description="Tandem repeat unit (bases)">

##INFO=<ID=STR,Number=0,Type=Flag,Description="Variant is a short tandem repeat">

When we to filter some using FilterMutectCalls, we also find some mutations will be marked with "str_contraction" in Filter column , and str_contraction means mutect2 reject the mutation. It shows that all the mutations marked with str_contraction have only one repeat number difference between reference and allele. Besides, these mutations must locates in the reference STR region with not less than 8bp.

I want to know whether or not reasonable to filter just based on "str_contraction", or there are some other useful methods ?

Thanks!

next-gen • 1.8k views
ADD COMMENT
0
Entering edit mode
2.8 years ago
Raman2 ▴ 30

Hello,

How does FilterMutectCalls classify some of the STR mutations as just deletions and the other ones as 'str_contraction'.

For example, below are three mutations from a TCGA vcf. All the three of them have 1 repeat unit deleted, but just one of them is classified as STR contraction.

I am confused how exactly a 'str contraction' differs from a deletion? Thanks very much!

chr1 5581147 rs536875083 ATG A . clustered_events;germline_risk;panel_of_normals;str_contraction CSQ=-|intergenic_variant|MODIFIER|||||||||||||||rs150348092|1||||deletion|||||||||||||||||||||||||||||||||||||||||||E_Multiple_observations&E_Freq&E_1000G;DB;ECNT=2;HCNT=3;MAX_ED=24;MIN_ED=24;NLOD=0.598;RPA=9,8;RU=TG;STR;TLOD=6.79 GT:AD:AF:ALT_F1R2:ALT_F2R1:QSS:REF_F1R2:REF_F2R1 0/0:3,0:0.00:0:0:69,0:0:3 0/1:4,4:0.500:2:1:120,121:2:2

chr1 11122204 . GT G . germline_risk;panel_of_normals CSQ=-|intron_variant|MODIFIER|MTOR|ENSG00000198793|Transcript|ENST00000361445|protein_coding||47/57|ENST00000361445.7:c.6663-79delA||-/8677|-/7650|-/2549||||1||-1||deletion|HGNC|HGNC:3942|YES|1||CCDS127.1|ENSP00000354558|P42345||UPI000012ABD3|NM_004958.3|||||||||||||||||||||||||||||||2475|,-|intron_variant|MODIFIER|MTOR|ENSG00000198793|Transcript|ENST00000376838|protein_coding||9/19|ENST00000376838.4:c.1278-79delA||-/4017|-/2265|-/754||||1||-1||deletion|HGNC|HGNC:3942||2|||ENSP00000366034||B1AKP8|UPI000047004A||||||||||||||||||||||||||||||||2475|;ECNT=1;HCNT=1;MAX_ED=.;MIN_ED=.;NLOD=2.29;RPA=10,9;RU=T;STR;TLOD=7.57 GT:AD:AF:ALT_F1R2:ALT_F2R1:QSS:REF_F1R2:REF_F2R1 0/0:10,0:0.00:0:0:242,0:4:5 0/1:12,11:0.263:3:7:232,331:1:6

chr1 22596237 . TG T . PASS CSQ=-|intron_variant|MODIFIER|EPHA8|ENSG00000070886|Transcript|ENST00000166244|protein_coding||9/16|ENST00000166244.6:c.1765+69delG||-/4943|-/3018|-/1005||||1||1||deletion|HGNC|HGNC:3391|YES|2||CCDS225.1|ENSP00000166244|P29322||UPI000012A07B|NM_020526.3|||||4||||||||||||||||||||||||||2046|;ECNT=1;HCNT=18;MAX_ED=.;MIN_ED=.;NLOD=9.61;RPA=5,4;RU=G;STR;TLOD=14.86 GT:AD:AF:ALT_F1R2:ALT_F2R1:QSS:REF_F1R2:REF_F2R1 0/0:36,0:0.00:0:0:1023,0:17:19 0/1:20,8:0.269:3:4:611,248:5:12

ADD COMMENT

Login before adding your answer.

Traffic: 1717 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6