Tool to identify recurrent mutations directly from VCF
1
1
Entering edit mode
7.2 years ago
ATpoint 85k

Is anyone aware of a tool that accepts multiple VCF files and checks for recurrence of mutations, preferentially using flexible definitions of recurrence. A simple example might be strict = mutation at the exact same position, relaxed = within a certain window, feature-based = within the same genomic feature (intron, exon, gene, promoter etc). So far I was using custom combinations of VCFtools and BEDtools together with annotation tools such as VEP, but maybe there is comprehensive solution out there?

vcf recurrent mutation snv mutation • 2.6k views
ADD COMMENT
0
Entering edit mode

A combination of the VariantAnnotation package, to read the VCF files (excluding some info fields to reduce size), and GenomicRanges/GenomicFeatures Bioconductor packages can provide the flexibility, annotation, and performance you want, I suspect.

ADD REPLY
0
Entering edit mode

Hi ATpoint, I'm a newbie trying to do just what you describe. would you share your code for handling this? it will probably save me many hours. many thanks.

ADD REPLY
1
Entering edit mode
7.2 years ago

I'm not a huge fan of the tool in general, but FunSeq2 kind of does what you want, though it doesn't have quite the level of precision it seems you're looking for.

RECUR (recurrent genes, regulatory elements and mutations within samples) Example: ‘RECUR=Pseudogene(ENST00000467115.1|chr1:568914-569121):PR1783(chr1:568941,chr1:569004),PR2832(chr1:569004)’ When analyzing multiple genomes, if genes or regulatory elements are shown in >= 2 samples, they are annotated as ‘gene/regulatory element name: recurrent samples (variants in corresponding samples (position is 1-based))’. If it is a same site mutation, ‘*’ is tagged.

DBRECUR (Recurrence databse) Example: ‘DBRECUR=Enhancer(chmm/segway|chr15:22517400-22521103):Lung_Adeno(Altered in 4/24(16.67%) samples.)| Prostate(Altered in 2/64(3.12%) samples.),Enhancer(drm|chr15:22517700-22521100):Lung_Adeno(Altered in 4/24(16.67%) samples.)| Prostate(Altered in 2/64(3.12%) samples.)’ If genes, regulatory elements or mutations are observed in the recurrence database (currently including 570 samples of 10 cancer types and COSMIC), the recurrence information is shown here. ‘recurrent element(name|coordinates):cancer type(recurrence information in this cancer type)’. Recurrence information is separated by ‘,’.

Be warned that its VCF output probably won't stay true to the format and likely won't run through anything else afterwards. I've had to go back and manually fix issues in the header to get it to run through other programs afterwards.

ADD COMMENT
0
Entering edit mode

Thanks for the suggestion. Already had a look at it earlier, but the point with FS2 is that it does not provide the required prebuilt genomic context for hg38 (which would take weeks to calculate according to the manual), so it is not an option for me.

ADD REPLY
0
Entering edit mode

Ah, tough luck there. Kinda surprised they haven't done that themselves given what it is.

ADD REPLY

Login before adding your answer.

Traffic: 2082 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6