filtering VCF files
1
0
Entering edit mode
8.6 years ago
Bogdan ★ 1.4k

Dear all,

after reading the submission for SMC (Somatic Mutation Challenge), we have identified a submission does the filtering of the VCF files in the following way (please see below) : any suggestions regarding a package that implements this filtering strategies ? thanks !

  • Read depth filtering: remove mutations when at least 2/3 of mutant allele bases in the tumor sample are of base quality < 25
  • Mapping quality filter: remove mutations when the median mapping quality of reads supporting mutant allele is < 20
  • Read position filter: remove mutations when the mutant allele is localized only at the extremities of reads (+/- 8 bases)
  • Strand bias filter: remove mutations when fisher test indicates strand disequilibrium only for the mutant allele (threshold 0.001)
  • Match normal filter: remove mutations when the mutant allele is present in more than 3% of the reads at a quality > 25 in the matching normal sample
  • Simple repeats filter: remove mutations that fall into a repeated region of the genome
  • Centromer filter: remove mutations that fall into centromer or telomer regions of the genome
  • Panel of normal filter: remove mutations that appear to be a SNP (3% of mutant allele) in at least 2 of other normal genomes, or that are frequent sequencing error (> 1 read carrying mutant allele) in at least half of the genomes in the pane
VCF • 4.0k views
ADD COMMENT
0
Entering edit mode

please validate or comment your previous questions:

ADD REPLY
0
Entering edit mode

Thank you gentlemen !

ADD REPLY
2
Entering edit mode
8.6 years ago

for almost all those tools I would use samtools view (SNP-POS) piped into https://github.com/lindenb/jvarkit/wiki/BioAlcidae to parse the reads and their cigar string in order to create a BED file. The VCF would be then filtered-out with this bed and betools.

ADD COMMENT
0
Entering edit mode

thanks Pierre ! please consider validated the previous questions ! About BioAlcidae ..looks a bit too complicated, although a specific example would help certainly ! Is there perhaps an alternative way to BioAlcidae ? thanks !

ADD REPLY
1
Entering edit mode

By "validate", Pierre means to accept correct answers and useful comments to your original questions using the green arrow or thumbs up buttons. That allows subsequent readers of your posts to assess the validity of the responses.

You may find it useful to review the posts on how to use Biostars (available here and here).

ADD REPLY
0
Entering edit mode

I don't say it's easy :-) but as far as I can see , most all the filters you need requires programming a new tool.

ADD REPLY
0
Entering edit mode

yes, quite complicated, especially when various somatic callers do not provide all the needed fields in the VCF files ;)

ADD REPLY
0
Entering edit mode

If it can help: I quickly wrote a tool to insert a BAM into a sqlite3 database: https://github.com/lindenb/jvarkit/wiki/BamTosql : you 'll be able to filter the data using MAPQ, clipping, base quality etc...

ADD REPLY
0
Entering edit mode

thanks Pierre, that was a great effort, thanks for sharing its results !

ADD REPLY

Login before adding your answer.

Traffic: 2489 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6