Dear all,
I have WES bam files from tumour samples and a normal sample of one patient, and I would like to estimate the tumour purity of the tumour samples with ABSOLUTE. They say that "you can supply a tab delimited segmentation file (e.g. from array CGH or massively parralel sequencing experiments) - this file must contain the columns "Chromosome", "Start", "End", "Num_Probes" and "Segment_Mean".".
I have read similar questions here and here, but I still haven't understood how to create the segmentation file.
I have already analysed copy number alterations (CNAs) using CopywriteR and it gives me a read_counts file with "Chromosomes", "Start", "End" and the read counts for the samples but I don't see where to get the "Num_probes" and the "Segment_Mean" information.
I would highly appreciate your help
Thank you very much for your answer and the script, now I could get ABSOLUTE running however I still have a number of questions:
My results of CopywriteR seem to be affected by the different tumour purity of the samples, i.e. samples with more normal tissue show less CNAs. Does that bias the results of ABSOLUTE since we are using
segment.Rdata
as an input?I used the base
RunAbsolute
command without point mutation information, however in the paper and in the manual they say that the somatic point mutations in MAF files may be used if available. I have the MAF files (I used MuTect) but I noticed as well that the variant allele frequencies (VAFs) of the mutations are influenced by the tumour purity of the sample, i.e. samples with more normal tissue have lower VAFs. So I wonder how could I use the MAF files if the VAFs are influenced by the unknown tumour purity. Actually, I was planning to use ABSOLUTE results to re-call mutations with MuTect explicitly saying the tumour purity, if that's possible...I used the parameters:
for the command
RunAbsolute
, which are used in the example in the manual page, so I was wondering if those are considered like default parameter values or they should be different for every sample or type of data?My last question is related to the answer that you wrote here some time ago. I was trying Theta2 as well and it worked well with the example data but when I tried my files I got an error. I posted a question in their Theta users group but it seems that the forum is very passive. Maybe I will post the question in this forum.
Thank you very much!
1) This is because the CNV signal is weaker, and it's expected to happen with any caller. However, ABSOLUTE and its competitors don't need a perfect segmentation to work effectively, as long as most of the larger CNVs are detected.
2,3) ABSOLUTE is not particularly easy to use, so many users just run it via GenePattern or GenomeSpace instead. Other more recent software including THetA2, PyClone, and BubbleTree appear to perform better now, so you might want to just use one of those instead -- they all take a CNV segmentation and SNP calls as input.
4) I'd give them a few more days to triage your bug report and post a reply. They do have a staff scientific programmer (last I checked) and are fairly dutiful in maintaining their software (for an academic lab). You could also try posting the issue on their GitHub page.