Tumor evolution using PhyloSub
0
1
Entering edit mode
9.4 years ago
Ikram ▴ 60

Hi,

I want to reconstruct the phylogenetic history for single and multiple tumor samples using phyloSub. I have somatic point mutations data generated using MuTect for all my samples. According to the directions given in phyloSub's README file, the input file expects following columns (copying from there)

  1. gene: identifier for each somatic variant (each row should have a unique identifier)
  2. a: number of reference allele read counts on the variant locus (if the read counts are from multiple tumour samples, the numbers from each sample should be separated by a comma ',', e.g. for three samples: 500, 342, 423)
  3. d: total number of reads at the locus (if the read counts are from multiple tumor samples, they should be separated by a comma ',')
  4. mu_r: fraction of expected reference allele sampling from reference population (e.g. if it is an A->T somatic mutation at the locus, the genotype of the reference population should be AA, so the mu_r should be 1-sequencing error rate)
  5. mu_v: fraction of expected reference allele sampling from variant population (e.g. if it is an A->T somatic mutation at the locus, copy number is 2 and the expected genotype is AT for the variant population, then the expected fraction of expected reference should be 0.5)
  6. delta_r and delta_v: pseudo-count for the Dirichlet prior on genotype probabilities (recommended value is 1, if no other prior information is available)

Being a starter in NGS data handling and tumor progression, I was wondering if somebody can guide me about which fields from MuTect would be used in above input? From my existing (very basic) knowledge, I think (correct me if I am wrong) that for (2) I will use t_ref_count, and for (3) I will use (t_ref_count + t_alt_count) from MuTect output. For (4), i.e., mu_r and for (5), i.e., mu_v, I am not sure which field(s) will be used. For (6), I think I can start with using 1 for both pseudocounts for Dirichlet prior (please advise if you have some suggestion).

For quick reference, here are the most prominent statistics/fields given in MuTect output.

  • contig: the contig location of this candidate
  • position: the 1-based position of this candidate on the given contig
  • ref_allele: the reference allele for this candidate
  • alt_allele: the mutant (alternate) allele for this candidate
  • covered: was the site powered to detect a mutation (80% power for a 0.3 allelic fraction mutation)
  • power: tumor_power * normal_power
  • tumor_power: given the tumor sequencing depth, what is the power to detect a mutation at 0.3 allelic fraction
  • normal_power: given the normal sequencing depth, what power did we have to detect (and reject) this as a germline variant
  • total_pairs: total tumor and normal read depth which come from paired reads
  • improper_pairs: number of reads which have abnormal pairing (orientation and distance)
  • map_Q0_reads: total number of mapping quality zero reads in the tumor and normal at this locus
  • init_t_lod: deprecated
  • t_lod_fstar: CORE STATISTIC: Log of (likelihood tumor event is real / likelihood event is sequencing error )
  • tumor_f: allelic fraction of this candidated based on read counts
  • contaminant_fraction: estimate of contamination fraction used (supplied or defaulted)
  • contaminant_lod: log likelihood of ( event is contamination / event is sequencing error )
  • t_ref_count: count of reference alleles in tumor
  • t_alt_count: count of alternate alleles in tumor
  • t_ref_sum: sum of quality scores of reference alleles in tumor
  • t_alt_sum: sum of quality scores of alternate alleles in tumor
  • t_ins_count: count of insertion events at this locus in tumor
  • t_del_count: count of deletion events at this locus in tumor
  • normal_best_gt: most likely genotype in the normal
  • init_n_lod: log likelihood of ( normal being reference / normal being altered )
  • n_ref_count: count of reference alleles in normal
  • n_alt_count: count of alternate alleles in normal
  • n_ref_sum: sum of quality scores of reference alleles in normal
  • n_alt_sum: sum of quality scores of alternate alleles in normal
  • judgement: final judgement of site KEEP or REJECT (not enough evidence or artifact)

Thanks in advance Ikram

next-gen genome phylosub tumor-evolution • 2.5k views
ADD COMMENT

Login before adding your answer.

Traffic: 1610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6