Is anyone aware of a tool that accepts multiple VCF files and checks for recurrence of mutations, preferentially using flexible definitions of recurrence. A simple example might be strict = mutation at the exact same position, relaxed = within a certain window, feature-based = within the same genomic feature (intron, exon, gene, promoter etc). So far I was using custom combinations of VCFtools and BEDtools together with annotation tools such as VEP, but maybe there is comprehensive solution out there?
A combination of the VariantAnnotation package, to read the VCF files (excluding some info fields to reduce size), and GenomicRanges/GenomicFeatures Bioconductor packages can provide the flexibility, annotation, and performance you want, I suspect.
Hi ATpoint, I'm a newbie trying to do just what you describe. would you share your code for handling this? it will probably save me many hours. many thanks.
I'm not a huge fan of the tool in general, but FunSeq2 kind of does what you want, though it doesn't have quite the level of precision it seems you're looking for.
RECUR (recurrent genes, regulatory elements and mutations within samples)
Example: ‘RECUR=Pseudogene(ENST00000467115.1|chr1:568914-569121):PR1783(chr1:568941,chr1:569004),PR2832(chr1:569004)’
When analyzing multiple genomes, if genes or regulatory elements are shown in >= 2 samples, they are annotated as ‘gene/regulatory
element name: recurrent samples (variants in corresponding samples (position is 1-based))’. If it is a same site mutation, ‘*’ is tagged.
DBRECUR (Recurrence databse)
Example: ‘DBRECUR=Enhancer(chmm/segway|chr15:22517400-22521103):Lung_Adeno(Altered in 4/24(16.67%) samples.)|
Prostate(Altered in 2/64(3.12%) samples.),Enhancer(drm|chr15:22517700-22521100):Lung_Adeno(Altered in 4/24(16.67%) samples.)|
Prostate(Altered in 2/64(3.12%) samples.)’
If genes, regulatory elements or mutations are observed in the recurrence database (currently including 570 samples of 10 cancer
types and COSMIC), the recurrence information is shown here. ‘recurrent element(name|coordinates):cancer type(recurrence information in this
cancer type)’. Recurrence information is separated by ‘,’.
Be warned that its VCF output probably won't stay true to the format and likely won't run through anything else afterwards. I've had to go back and manually fix issues in the header to get it to run through other programs afterwards.
Thanks for the suggestion. Already had a look at it earlier, but the point with FS2 is that it does not provide the required prebuilt genomic context for hg38 (which would take weeks to calculate according to the manual), so it is not an option for me.
A combination of the VariantAnnotation package, to read the VCF files (excluding some info fields to reduce size), and GenomicRanges/GenomicFeatures Bioconductor packages can provide the flexibility, annotation, and performance you want, I suspect.
Hi ATpoint, I'm a newbie trying to do just what you describe. would you share your code for handling this? it will probably save me many hours. many thanks.