Does mageck-vispr calculate differently than mageck?
1
0
Entering edit mode
1 day ago
Assa Yeroslaviz ★ 1.9k

Hi,

I'm testing my CRIPSR data using the mageck tool.

I have tried both mageck and mageck-vispr with the mle algorithm. For some reason I get different normalised counts even though the raw count numbers are identical. For some reason I can see that the size factors used for the normalisation are different in both tools, even though the counts are similar.

raw-normalised-counts

Is it because mageck-vispr doing something different in the normalisation process, or am I missing something?

Assa Yeroslaviz Oct 10, 2024, 5:34:35 PM (4 days ago) to mageck I was running a comparison of mageck-mle and mageck-vispr to see the results. Even though the count numbers are identical, I get dfferent results in the mle test with the exact same design matrix.

I can also see different size factors, even though both were done with the median normalization

Is that something I should expect?

this is the top of log file from mageck mle

INFO  @ Thu, 10 Oct 2024 16:16:04: Parameters: /fs/home/yeroslaviz/miniconda3/envs/mageck/bin/mageck mle -k mageck_count_raw/P842_rawCounts.count.txt --norm-method median -d desgin_mat_ctrlt0-16.txt -n mageck_mle_raw_ctrlt16_vs_ctrl-t0_median/mageck_mle_ctrlt16_vs_ctrl-t0_median
INFO  @ Thu, 10 Oct 2024 16:16:05: Cannot parse design matrix as a string; try to parse it as a file name ...
INFO  @ Thu, 10 Oct 2024 16:16:05: Design matrix:
INFO  @ Thu, 10 Oct 2024 16:16:05: [[1. 0. 0.]
INFO  @ Thu, 10 Oct 2024 16:16:05:  [1. 0. 0.]
INFO  @ Thu, 10 Oct 2024 16:16:05:  [1. 0. 1.]
INFO  @ Thu, 10 Oct 2024 16:16:05:  [1. 0. 1.]
INFO  @ Thu, 10 Oct 2024 16:16:05:  [1. 0. 1.]
INFO  @ Thu, 10 Oct 2024 16:16:05:  [1. 0. 1.]]
INFO  @ Thu, 10 Oct 2024 16:16:05: Beta labels:baseline,ctrl_t0,ctrl_t16
INFO  @ Thu, 10 Oct 2024 16:16:05: Included samples:ctrl_t0_1,ctrl_t0_2,ctrl_t16_1,ctrl_t16_2,ctrl_t16_3,ctrl_t16_4
INFO  @ Thu, 10 Oct 2024 16:16:05: Loaded samples:ctrl_t0_1;ctrl_t0_2;ctrl_t16_1;ctrl_t16_2;ctrl_t16_3;ctrl_t16_4
INFO  @ Thu, 10 Oct 2024 16:16:05: Sample index: 7;19;21;11;18;15
INFO  @ Thu, 10 Oct 2024 16:16:05: Loaded 608 genes.
DEBUG @ Thu, 10 Oct 2024 16:16:05: Initial (total) size factor: 2.9902654067986982 1.4024448242843242 0.8543356307179322 0.5446869498697177 1.0944611279354277 0.96858658018199
DEBUG @ Thu, 10 Oct 2024 16:16:05: Median factor: 2.633450259329661 1.2183702648635089 0.7507482241278426 0.47983126923951636 0.9651443018978983 0.852167802045816
INFO  @ Thu, 10 Oct 2024 16:16:05: Final size factor: 2.633450259329661 1.2183702648635089 0.7507482241278426 0.47983126923951636 0.9651443018978983 0.852167802045816
INFO  @ Thu, 10 Oct 2024 16:16:05: size factor: 0.3797299745674892,0.820768553566126,1.3320044828100894,2.0840659292273678,1.0361144940021507,1.173477802845027 

and this is from the mageck-vispr run:

INFO  @ Thu, 10 Oct 2024 15:32:13: Parameters: /fs/home/yeroslaviz/miniconda3/envs/mageck/bin/mageck mle --norm-method median --output-prefix results/test/mle --genes-var 0 --count-table results/count/all.count.txt --threads 24 --design-matrix /fs/pool/pool-cox-projects-bioinformatics/AG_Murray/Monika/P842/mageck/desgin_mat_ctrlt0-16.txt
INFO  @ Thu, 10 Oct 2024 15:32:15: Cannot parse design matrix as a string; try to parse it as a file name ...
INFO  @ Thu, 10 Oct 2024 15:32:15: Design matrix:
INFO  @ Thu, 10 Oct 2024 15:32:15: [[1. 0. 0.]
INFO  @ Thu, 10 Oct 2024 15:32:15:  [1. 0. 0.]
INFO  @ Thu, 10 Oct 2024 15:32:15:  [1. 0. 1.]
INFO  @ Thu, 10 Oct 2024 15:32:15:  [1. 0. 1.]
INFO  @ Thu, 10 Oct 2024 15:32:15:  [1. 0. 1.]
INFO  @ Thu, 10 Oct 2024 15:32:15:  [1. 0. 1.]]
INFO  @ Thu, 10 Oct 2024 15:32:15: Beta labels:baseline,ctrl_t0,ctrl_t16
INFO  @ Thu, 10 Oct 2024 15:32:15: Included samples:ctrl_t0_1_R2,ctrl_t0_2,ctrl_t16_1,ctrl_t16_2,ctrl_t16_3,ctrl_t16_4
INFO  @ Thu, 10 Oct 2024 15:32:15: Loaded samples:ctrl_t0_1_R2;ctrl_t0_2;ctrl_t16_1;ctrl_t16_2;ctrl_t16_3;ctrl_t16_4
INFO  @ Thu, 10 Oct 2024 15:32:15: Sample index: 0;1;2;3;4;5
INFO  @ Thu, 10 Oct 2024 15:32:15: Loaded 608 genes.
DEBUG @ Thu, 10 Oct 2024 15:32:15: Initial (total) size factor: 2.967156629643336 1.3717737687541605 0.8517161762082943 0.5397213354693214 1.0957860471056007 1.005531533258309
DEBUG @ Thu, 10 Oct 2024 15:32:15: Median factor: 2.6171842695636727 1.1919005200159856 0.748698362779367 0.4764402486406621 0.9639260010018953 0.8856900365565812
INFO  @ Thu, 10 Oct 2024 15:32:15: Final size factor: 2.6171842695636727 1.1919005200159856 0.748698362779367 0.4764402486406621 0.9639260010018953 0.8856900365565812
INFO  @ Thu, 10 Oct 2024 15:32:15: size factor: 0.38209002385862434,0.8389961940670922,1.3356513780633026,2.0988990809511017,1.0374240335467761,1.1290631696477442 

I would appreciate any explanation to this behavoir.

thanks

Assa

mageck mageck-vispr pooled-screen cripsr • 243 views
ADD COMMENT
0
Entering edit mode

Make sure you run this on the same input. I see two different count tables in terms of different names. I see no difference in terms of the command line suggesting raw counts are slightly different.

ADD REPLY
0
Entering edit mode

I can't influence the name in vispr.

Both methods starts with fastq files, only in mageck I need to do two steps to runs the mle algorithm, first the quantification and then the test. in vispr it goes in one step and the all.count is the automated name it gives the count table. it is created automatically when vispr is running.

ADD REPLY
0
Entering edit mode
1 day ago

This probably stems from differences in the gene variance modeling (which is set to --genes-var 0 in the VISPR run. It's 1000 by default in a typical MLE run.

See the docs:

--genes-varmodeling GENES_VARMODELING The number of genes for mean-variance modeling. Default 1000.

As for why that's the case, I am not sure. Perhaps it can be controlled at the top level of the VISPR command.

ADD COMMENT
0
Entering edit mode

But the mean-variance modelling is based on the normalized data (actually raw data corrected by size factors), so that comes after, no?

ADD REPLY
0
Entering edit mode

Presumably, but I haven't dug through the source code to verify. OP, can you post the full logs from your vispr call? And the commands you used for the count command fed into mle? It'd be nice to see if there are differences in the count command used by each. Your config file for vispr might also be helpful.

ADD REPLY

Login before adding your answer.

Traffic: 1122 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6