I am using PICNIC for copy number analysis in cell lines from Affymetrix 6.0 data (Greeman Biostatistics 2010). PICNIC generates an output file documented in this PDF file. The documentation and Matlab source (specifically HMM_RunB.m
) indicate the output file should have 17 columns. I do not have Matlab, so I am calling PICNIC from the binary build that does not require Matlab. This code generates an output file has 16 columns, not 17, and I am having a hard time figuring out what these columns are. Does anyone know with certainty the column headers for this build? They are not written into the output file itself, which is less helpful than it could be. Sample output for two rows:
3414949,1,1485718,0.801,2.22e-16,2,1,0,1,1,-1.77e-12,2,0.5,2,1,1
3186957,1,1488015,0.588,0.337,2,0,1,0,2,-8.08e-13,0.999,0.999,7.342e-27,0.0005,0.999
The headers listed in the documentation are:
- SNP Identifier
- Raw Intensity Ratio
- Allelic Angle
- Actual Copy Number
- Segmented Total Copy Number
- Segmented Minor Copy Number
- Middle Fitted Angle Height (above 0.5)
- Outer Fitted Angle Height (above 0.5)
- LOH index
- No. A copies (genotype)
- No. B copies (genotype)
- State Change Probability
- Genotyping Confidence
- Genotyping Confidence Conditional Upon State Classification
- Heterozygous Probability
- Allele A LOH probability
- Allele B LOH probability
EDIT: Added a typical call in response to a comment posted below. There is surely a cleaner way to extract the PI, ploidy, and alpha but this worked. Note the trailing slashes in directory names in the call to HMM, which I found to be required. PICNIC has undocumented expectations about the file names for input; it expects the original CEL file name to start with "CGP_", and gives uninformative output names if that is not present.
/PICNIC_DIR/preprocessing CELFILE.feature_intensity \
/PICNIC_DIR/info/ \
/PICNIC_OUTPUT_DIR/raw/ \
/PICNIC_OUTPUT_DIR/output/ \
/PICNIC_OUTPUT_DIR/
IN_PI=$(cat /PICNIC_OUTPUT_DIR/output2/CELFILE_feature.TXT/ploidy_CELFILE_feature.TXT.csv | cut -f 1 -d ',')
PLOIDY=$(cat /PICNIC_OUTPUT_DIR/output2/CELFILE_feature.TXT/ploidy_CELFILE_feature.TXT.csv | cut -f 2 -d ',')
ALPHA=$(cat /PICNIC_OUTPUT_DIR/output2/CELFILE_feature.TXT/ploidy_CELFILE_feature.TXT.csv | cut -f 3 -d ',')
/PICNIC_DIR/HMM \
CELFILE.feature_intensity \
/PICNIC_DIR/info/ \
/PICNIC_OUTPUT_DIR/output/ \
/PICNIC_OUTPUT_DIR/ 8 $IN_PI $PLOIDY $ALPHA
Apologies for what may probably be a misuse of the comments section, but would you mind sharing the input code of your run? I have tried running the linux executable version of PICNIC as well but could not get past this error message:
My input was
with a contamination estimate of 0 as was indicated in the manual for cell line samples.
There is a grave lack of online discussion on this tool despite it's purported utility...
I added my call. The source for
preprocessing.m
indicates it's tripping over the linepr_pi
is supposed to be set at the top of the script but if you read the code, passing theCELL_LINE
parameter and not passing 5 parameters results in failure to setpr_pi
. It's a bug in the PICNIC code:I had a suspicion that it was a bug! Thank you, and thanks very much for the CGP-prepend tip, it works now. I am going through this output with a colleague who has run the Matlab version of PICNIC previously; if he has any thoughts on the 16-column output (which is what I got as well) I will post them here.