Hi,
I'm testing my CRIPSR data using the mageck
tool.
I have tried both mageck
and mageck-vispr
with the mle
algorithm. For some reason I get different normalised counts even though the raw count numbers are identical.
For some reason I can see that the size factors used for the normalisation are different in both tools, even though the counts are similar.
Is it because mageck-vispr
doing something different in the normalisation process, or am I missing something?
Assa Yeroslaviz Oct 10, 2024, 5:34:35 PM (4 days ago) to mageck I was running a comparison of mageck-mle and mageck-vispr to see the results. Even though the count numbers are identical, I get dfferent results in the mle test with the exact same design matrix.
I can also see different size factors, even though both were done with the median normalization
Is that something I should expect?
this is the top of log file from mageck mle
INFO @ Thu, 10 Oct 2024 16:16:04: Parameters: mageck mle -k mageck_count_raw/P842_rawCounts.count.txt --norm-method median -d desgin_mat_ctrlt0-16.txt -n mageck_mle_raw_ctrlt16_vs_ctrl-t0_median/mageck_mle_ctrlt16_vs_ctrl-t0_median
INFO @ Thu, 10 Oct 2024 16:16:05: Cannot parse design matrix as a string; try to parse it as a file name ...
INFO @ Thu, 10 Oct 2024 16:16:05: Design matrix:
INFO @ Thu, 10 Oct 2024 16:16:05: [[1. 0. 0.]
INFO @ Thu, 10 Oct 2024 16:16:05: [1. 0. 0.]
INFO @ Thu, 10 Oct 2024 16:16:05: [1. 0. 1.]
INFO @ Thu, 10 Oct 2024 16:16:05: [1. 0. 1.]
INFO @ Thu, 10 Oct 2024 16:16:05: [1. 0. 1.]
INFO @ Thu, 10 Oct 2024 16:16:05: [1. 0. 1.]]
INFO @ Thu, 10 Oct 2024 16:16:05: Beta labels:baseline,ctrl_t0,ctrl_t16
INFO @ Thu, 10 Oct 2024 16:16:05: Included samples:ctrl_t0_1,ctrl_t0_2,ctrl_t16_1,ctrl_t16_2,ctrl_t16_3,ctrl_t16_4
INFO @ Thu, 10 Oct 2024 16:16:05: Loaded samples:ctrl_t0_1;ctrl_t0_2;ctrl_t16_1;ctrl_t16_2;ctrl_t16_3;ctrl_t16_4
INFO @ Thu, 10 Oct 2024 16:16:05: Sample index: 7;19;21;11;18;15
INFO @ Thu, 10 Oct 2024 16:16:05: Loaded 608 genes.
DEBUG @ Thu, 10 Oct 2024 16:16:05: Initial (total) size factor: 2.9902654067986982 1.4024448242843242 0.8543356307179322 0.5446869498697177 1.0944611279354277 0.96858658018199
DEBUG @ Thu, 10 Oct 2024 16:16:05: Median factor: 2.633450259329661 1.2183702648635089 0.7507482241278426 0.47983126923951636 0.9651443018978983 0.852167802045816
INFO @ Thu, 10 Oct 2024 16:16:05: Final size factor: 2.633450259329661 1.2183702648635089 0.7507482241278426 0.47983126923951636 0.9651443018978983 0.852167802045816
INFO @ Thu, 10 Oct 2024 16:16:05: size factor: 0.3797299745674892,0.820768553566126,1.3320044828100894,2.0840659292273678,1.0361144940021507,1.173477802845027
and this is from the mageck-vispr run:
INFO @ Thu, 10 Oct 2024 15:32:13: Parameters: /fs/home/yeroslaviz/miniconda3/envs/mageck/bin/mageck mle --norm-method median --output-prefix results/test/mle --genes-var 0 --count-table results/count/all.count.txt --threads 24 --design-matrix mageck/desgin_mat_ctrlt0-16.txt
INFO @ Thu, 10 Oct 2024 15:32:15: Cannot parse design matrix as a string; try to parse it as a file name ...
INFO @ Thu, 10 Oct 2024 15:32:15: Design matrix:
INFO @ Thu, 10 Oct 2024 15:32:15: [[1. 0. 0.]
INFO @ Thu, 10 Oct 2024 15:32:15: [1. 0. 0.]
INFO @ Thu, 10 Oct 2024 15:32:15: [1. 0. 1.]
INFO @ Thu, 10 Oct 2024 15:32:15: [1. 0. 1.]
INFO @ Thu, 10 Oct 2024 15:32:15: [1. 0. 1.]
INFO @ Thu, 10 Oct 2024 15:32:15: [1. 0. 1.]]
INFO @ Thu, 10 Oct 2024 15:32:15: Beta labels:baseline,ctrl_t0,ctrl_t16
INFO @ Thu, 10 Oct 2024 15:32:15: Included samples:ctrl_t0_1_R2,ctrl_t0_2,ctrl_t16_1,ctrl_t16_2,ctrl_t16_3,ctrl_t16_4
INFO @ Thu, 10 Oct 2024 15:32:15: Loaded samples:ctrl_t0_1_R2;ctrl_t0_2;ctrl_t16_1;ctrl_t16_2;ctrl_t16_3;ctrl_t16_4
INFO @ Thu, 10 Oct 2024 15:32:15: Sample index: 0;1;2;3;4;5
INFO @ Thu, 10 Oct 2024 15:32:15: Loaded 608 genes.
DEBUG @ Thu, 10 Oct 2024 15:32:15: Initial (total) size factor: 2.967156629643336 1.3717737687541605 0.8517161762082943 0.5397213354693214 1.0957860471056007 1.005531533258309
DEBUG @ Thu, 10 Oct 2024 15:32:15: Median factor: 2.6171842695636727 1.1919005200159856 0.748698362779367 0.4764402486406621 0.9639260010018953 0.8856900365565812
INFO @ Thu, 10 Oct 2024 15:32:15: Final size factor: 2.6171842695636727 1.1919005200159856 0.748698362779367 0.4764402486406621 0.9639260010018953 0.8856900365565812
INFO @ Thu, 10 Oct 2024 15:32:15: size factor: 0.38209002385862434,0.8389961940670922,1.3356513780633026,2.0988990809511017,1.0374240335467761,1.1290631696477442
I would appreciate any explanation to this behavoir.
thanks
Assa
Make sure you run this on the same input. I see two different count tables in terms of different names. I see no difference in terms of the command line suggesting raw counts are slightly different.
I can't influence the name in
vispr
.Both methods starts with fastq files, only in
mageck
I need to do two steps to runs themle
algorithm, first the quantification and then the test. invispr
it goes in one step and the all.count is the automated name it gives the count table. it is created automatically whenvispr
is running.The
mageck count
command (the top of the log files lists the parametersthe rest is just reading the files, calculating data fro trimming and gini indices. I can upload the complete log if needed.
Do you have the
count
command log fromvispr
as well?sorry, I thought it was loaded. here it is.
The log file is unfortunately too lomng to post here completely, but you can download the complete file here
thanks for looking into it.
This is complete
mageck mle
log file, though it was used only to compare specific samples within the complete subset.