Question

What are the output values for PICNIC when called from the binary build?

1

Entering edit mode

10.5 years ago

David Quigley 11k

I am using PICNIC for copy number analysis in cell lines from Affymetrix 6.0 data (Greeman Biostatistics 2010). PICNIC generates an output file documented in this PDF file. The documentation and Matlab source (specifically HMM_RunB.m) indicate the output file should have 17 columns. I do not have Matlab, so I am calling PICNIC from the binary build that does not require Matlab. This code generates an output file has 16 columns, not 17, and I am having a hard time figuring out what these columns are. Does anyone know with certainty the column headers for this build? They are not written into the output file itself, which is less helpful than it could be. Sample output for two rows:

3414949,1,1485718,0.801,2.22e-16,2,1,0,1,1,-1.77e-12,2,0.5,2,1,1
3186957,1,1488015,0.588,0.337,2,0,1,0,2,-8.08e-13,0.999,0.999,7.342e-27,0.0005,0.999

The headers listed in the documentation are:

SNP Identifier
Raw Intensity Ratio
Allelic Angle
Actual Copy Number
Segmented Total Copy Number
Segmented Minor Copy Number
Middle Fitted Angle Height (above 0.5)
Outer Fitted Angle Height (above 0.5)
LOH index
No. A copies (genotype)
No. B copies (genotype)
State Change Probability
Genotyping Confidence
Genotyping Confidence Conditional Upon State Classification
Heterozygous Probability
Allele A LOH probability
Allele B LOH probability

EDIT: Added a typical call in response to a comment posted below. There is surely a cleaner way to extract the PI, ploidy, and alpha but this worked. Note the trailing slashes in directory names in the call to HMM, which I found to be required. PICNIC has undocumented expectations about the file names for input; it expects the original CEL file name to start with "CGP_", and gives uninformative output names if that is not present.

/PICNIC_DIR/preprocessing CELFILE.feature_intensity \
  /PICNIC_DIR/info/ \
  /PICNIC_OUTPUT_DIR/raw/ \
  /PICNIC_OUTPUT_DIR/output/ \
  /PICNIC_OUTPUT_DIR/

IN_PI=$(cat /PICNIC_OUTPUT_DIR/output2/CELFILE_feature.TXT/ploidy_CELFILE_feature.TXT.csv | cut -f 1 -d ',')
PLOIDY=$(cat /PICNIC_OUTPUT_DIR/output2/CELFILE_feature.TXT/ploidy_CELFILE_feature.TXT.csv | cut -f 2 -d ',')
ALPHA=$(cat /PICNIC_OUTPUT_DIR/output2/CELFILE_feature.TXT/ploidy_CELFILE_feature.TXT.csv | cut -f 3 -d ',')

/PICNIC_DIR/HMM \
  CELFILE.feature_intensity \
  /PICNIC_DIR/info/ \
  /PICNIC_OUTPUT_DIR/output/ \
  /PICNIC_OUTPUT_DIR/ 8 $IN_PI $PLOIDY $ALPHA

PICNIC copy-number SNP • 3.2k views

ADD COMMENT • link updated 3.3 years ago by Ram 45k • written 10.5 years ago by David Quigley 11k

0

Entering edit mode

Apologies for what may probably be a misuse of the comments section, but would you mind sharing the input code of your run? I have tried running the linux executable version of PICNIC as well but could not get past this error message:

Undefined function or variable "pr_pi". Error in ==> preprocessing at 107"

My input was

sh run_preprocessing.sh rootDir/Matlab_Compiler_Runtime/v710/ 080122_SNP6.0_184B5_B01.feature_intensity ../info/ ../outdir/raw/ ../outdir/output ../outdir/ 'CELL_LINE' 0.0

with a contamination estimate of 0 as was indicated in the manual for cell line samples.

There is a grave lack of online discussion on this tool despite it's purported utility...

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.4 years ago by a1249m • 0

0

Entering edit mode

I added my call. The source for preprocessing.m indicates it's tripping over the line

ploidy=find_ploidy(seg_info_update,sample_type,pr_pi);

pr_pi is supposed to be set at the top of the script but if you read the code, passing the CELL_LINE parameter and not passing 5 parameters results in failure to set pr_pi. It's a bug in the PICNIC code:

elseif nargin > 5
    if (strcmp(sample_type, 'PRIMARY') )
        if (nargin == 6)
            disp('not enough parameters');
            usage();
            exit(0);
        else
            pr_pi=str2num(in_pi);
        end;
    elseif (strcmp(sample_type, 'CELL_LINE') )
        if (nargin > 7)
            disp('too many parameters');
            usage();
            exit(0);
        end;
    else
        pr_pi=0;
    end;    
end;

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.4 years ago by David Quigley 11k

0

Entering edit mode

I had a suspicion that it was a bug! Thank you, and thanks very much for the CGP-prepend tip, it works now. I am going through this output with a colleague who has run the Matlab version of PICNIC previously; if he has any thoughts on the 16-column output (which is what I got as well) I will post them here.

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.4 years ago by a1249m • 0