Does mageck-vispr calculate differently than mageck?
1
0
Entering edit mode
5 weeks ago
Assa Yeroslaviz ★ 1.9k

Hi,

I'm testing my CRIPSR data using the mageck tool.

I have tried both mageck and mageck-vispr with the mle algorithm. For some reason I get different normalised counts even though the raw count numbers are identical. For some reason I can see that the size factors used for the normalisation are different in both tools, even though the counts are similar.

raw-normalised-counts

Is it because mageck-vispr doing something different in the normalisation process, or am I missing something?

Assa Yeroslaviz Oct 10, 2024, 5:34:35 PM (4 days ago) to mageck I was running a comparison of mageck-mle and mageck-vispr to see the results. Even though the count numbers are identical, I get dfferent results in the mle test with the exact same design matrix.

I can also see different size factors, even though both were done with the median normalization

Is that something I should expect?

this is the top of log file from mageck mle

INFO  @ Thu, 10 Oct 2024 16:16:04: Parameters: mageck mle -k mageck_count_raw/P842_rawCounts.count.txt --norm-method median -d desgin_mat_ctrlt0-16.txt -n mageck_mle_raw_ctrlt16_vs_ctrl-t0_median/mageck_mle_ctrlt16_vs_ctrl-t0_median
INFO  @ Thu, 10 Oct 2024 16:16:05: Cannot parse design matrix as a string; try to parse it as a file name ...
INFO  @ Thu, 10 Oct 2024 16:16:05: Design matrix:
INFO  @ Thu, 10 Oct 2024 16:16:05: [[1. 0. 0.]
INFO  @ Thu, 10 Oct 2024 16:16:05:  [1. 0. 0.]
INFO  @ Thu, 10 Oct 2024 16:16:05:  [1. 0. 1.]
INFO  @ Thu, 10 Oct 2024 16:16:05:  [1. 0. 1.]
INFO  @ Thu, 10 Oct 2024 16:16:05:  [1. 0. 1.]
INFO  @ Thu, 10 Oct 2024 16:16:05:  [1. 0. 1.]]
INFO  @ Thu, 10 Oct 2024 16:16:05: Beta labels:baseline,ctrl_t0,ctrl_t16
INFO  @ Thu, 10 Oct 2024 16:16:05: Included samples:ctrl_t0_1,ctrl_t0_2,ctrl_t16_1,ctrl_t16_2,ctrl_t16_3,ctrl_t16_4
INFO  @ Thu, 10 Oct 2024 16:16:05: Loaded samples:ctrl_t0_1;ctrl_t0_2;ctrl_t16_1;ctrl_t16_2;ctrl_t16_3;ctrl_t16_4
INFO  @ Thu, 10 Oct 2024 16:16:05: Sample index: 7;19;21;11;18;15
INFO  @ Thu, 10 Oct 2024 16:16:05: Loaded 608 genes.
DEBUG @ Thu, 10 Oct 2024 16:16:05: Initial (total) size factor: 2.9902654067986982 1.4024448242843242 0.8543356307179322 0.5446869498697177 1.0944611279354277 0.96858658018199
DEBUG @ Thu, 10 Oct 2024 16:16:05: Median factor: 2.633450259329661 1.2183702648635089 0.7507482241278426 0.47983126923951636 0.9651443018978983 0.852167802045816
INFO  @ Thu, 10 Oct 2024 16:16:05: Final size factor: 2.633450259329661 1.2183702648635089 0.7507482241278426 0.47983126923951636 0.9651443018978983 0.852167802045816
INFO  @ Thu, 10 Oct 2024 16:16:05: size factor: 0.3797299745674892,0.820768553566126,1.3320044828100894,2.0840659292273678,1.0361144940021507,1.173477802845027 

and this is from the mageck-vispr run:

INFO  @ Thu, 10 Oct 2024 15:32:13: Parameters: /fs/home/yeroslaviz/miniconda3/envs/mageck/bin/mageck mle --norm-method median --output-prefix results/test/mle --genes-var 0 --count-table results/count/all.count.txt --threads 24 --design-matrix mageck/desgin_mat_ctrlt0-16.txt
INFO  @ Thu, 10 Oct 2024 15:32:15: Cannot parse design matrix as a string; try to parse it as a file name ...
INFO  @ Thu, 10 Oct 2024 15:32:15: Design matrix:
INFO  @ Thu, 10 Oct 2024 15:32:15: [[1. 0. 0.]
INFO  @ Thu, 10 Oct 2024 15:32:15:  [1. 0. 0.]
INFO  @ Thu, 10 Oct 2024 15:32:15:  [1. 0. 1.]
INFO  @ Thu, 10 Oct 2024 15:32:15:  [1. 0. 1.]
INFO  @ Thu, 10 Oct 2024 15:32:15:  [1. 0. 1.]
INFO  @ Thu, 10 Oct 2024 15:32:15:  [1. 0. 1.]]
INFO  @ Thu, 10 Oct 2024 15:32:15: Beta labels:baseline,ctrl_t0,ctrl_t16
INFO  @ Thu, 10 Oct 2024 15:32:15: Included samples:ctrl_t0_1_R2,ctrl_t0_2,ctrl_t16_1,ctrl_t16_2,ctrl_t16_3,ctrl_t16_4
INFO  @ Thu, 10 Oct 2024 15:32:15: Loaded samples:ctrl_t0_1_R2;ctrl_t0_2;ctrl_t16_1;ctrl_t16_2;ctrl_t16_3;ctrl_t16_4
INFO  @ Thu, 10 Oct 2024 15:32:15: Sample index: 0;1;2;3;4;5
INFO  @ Thu, 10 Oct 2024 15:32:15: Loaded 608 genes.
DEBUG @ Thu, 10 Oct 2024 15:32:15: Initial (total) size factor: 2.967156629643336 1.3717737687541605 0.8517161762082943 0.5397213354693214 1.0957860471056007 1.005531533258309
DEBUG @ Thu, 10 Oct 2024 15:32:15: Median factor: 2.6171842695636727 1.1919005200159856 0.748698362779367 0.4764402486406621 0.9639260010018953 0.8856900365565812
INFO  @ Thu, 10 Oct 2024 15:32:15: Final size factor: 2.6171842695636727 1.1919005200159856 0.748698362779367 0.4764402486406621 0.9639260010018953 0.8856900365565812
INFO  @ Thu, 10 Oct 2024 15:32:15: size factor: 0.38209002385862434,0.8389961940670922,1.3356513780633026,2.0988990809511017,1.0374240335467761,1.1290631696477442 

I would appreciate any explanation to this behavoir.

thanks

Assa

mageck mageck-vispr pooled-screen cripsr • 749 views
ADD COMMENT
0
Entering edit mode

Make sure you run this on the same input. I see two different count tables in terms of different names. I see no difference in terms of the command line suggesting raw counts are slightly different.

ADD REPLY
0
Entering edit mode

I can't influence the name in vispr.

Both methods starts with fastq files, only in mageck I need to do two steps to runs the mle algorithm, first the quantification and then the test. in vispr it goes in one step and the all.count is the automated name it gives the count table. it is created automatically when vispr is running.

ADD REPLY
0
Entering edit mode

The mageck count command (the top of the log files lists the parameters

INFO  @ Mon, 14 Oct 2024 15:41:44: Parameters: mageck count -l mageck_NonHuman_Trimmed.txt -n mageck_count_raw/P842_rawCounts --pdf-report --trim-5 AUTO --sample-label ctrl_t0_1,ctrl_t0_2,ctrl_t16_1,ctrl_t16_2,ctrl_t16_3,ctrl_t16_4,ctrl_t8_1,ctrl_t8_2,ctrl_t8_3,ctrl_t8_4,ctrl_thawed_t0_1,ctrl_thawed_t0_2,ctrl_thawed_t0_3,ctrl_thawed_t9_1,Palbo_1x_t16_1,Palbo_1x_t16_2,Palbo_1x_t16_3,Palbo_1x_t16_4,Palbo_1x_t8_1,Palbo_1x_t8_2,Palbo_1x_t8_3,Palbo_1x_t8_4 --control-sgrna CTRL_sgRNA.txt --norm-method control --keep-tmp --fastq ../rawdata/ctrl_t0_1_R2.fastq.gz ../rawdata/ctrl_t0_2_R2.fastq.gz ../rawdata/ctrl_t16_1_R2.fastq.gz ../rawdata/ctrl_t16_2_R2.fastq.gz ../rawdata/ctrl_t16_3_R2.fastq.gz ../rawdata/ctrl_t16_4_R2.fastq.gz ../rawdata/ctrl_t8_1_R2.fastq.gz ../rawdata/ctrl_t8_2_R2.fastq.gz ../rawdata/ctrl_t8_3_R2.fastq.gz ../rawdata/ctrl_t8_4_R2.fastq.gz ../rawdata/ctrl_thawed_t0_1_R2.fastq.gz ../rawdata/ctrl_thawed_t0_2_R2.fastq.gz ../rawdata/ctrl_thawed_t0_3_R2.fastq.gz ../rawdata/ctrl_thawed_t9_1_R2.fastq.gz ../rawdata/Palbo_1x_t16_1_R2.fastq.gz ../rawdata/Palbo_1x_t16_2_R2.fastq.gz ../rawdata/Palbo_1x_t16_3_R2.fastq.gz ../rawdata/Palbo_1x_t16_4_R2.fastq.gz ../rawdata/Palbo_1x_t8_1_R2.fastq.gz ../rawdata/Palbo_1x_t8_2_R2.fastq.gz ../rawdata/Palbo_1x_t8_3_R2.fastq.gz ../rawdata/Palbo_1x_t8_4_R2.fastq.gz 
INFO  @ Mon, 14 Oct 2024 15:41:44: Welcome to MAGeCK v0.5.9.5. Command: count 
INFO  @ Mon, 14 Oct 2024 15:41:44: Header line of the library file detected; skip the first line ... 
INFO  @ Mon, 14 Oct 2024 15:41:44: Loading 1868 predefined sgRNAs. 
WARNING @ Mon, 14 Oct 2024 15:41:44: There are 0 sgRNAs with duplicated sequences. 
INFO  @ Mon, 14 Oct 2024 15:41:44: Parsing FASTQ file ../rawdata/ctrl_t0_1_R2.fastq.gz... 
INFO  @ Mon, 14 Oct 2024 15:41:44: Determining the trim-5 length of FASTQ file ../rawdata/ctrl_t0_1_R2.fastq.gz... 
INFO  @ Mon, 14 Oct 2024 15:41:44: Possible gRNA lengths:20 
INFO  @ Mon, 14 Oct 2024 15:41:44: Processing 0M reads ... 
INFO  @ Mon, 14 Oct 2024 15:41:46: Read length:74 
...
INFO  @ Mon, 14 Oct 2024 16:07:59: 168 out of 168 control sgRNAs are found in count table. 
INFO  @ Mon, 14 Oct 2024 16:07:59: Loaded 168 control sgRNAs from CTRL_sgRNA.txt 
INFO  @ Mon, 14 Oct 2024 16:07:59: control sgRNAs for normalization:168 
DEBUG @ Mon, 14 Oct 2024 16:07:59: Initial (total) size factor: 3.643972196396157 1.7472148631207118 0.7579672463278744 0.4781876991567586 0.9532875245252658 0.8511209173329068 0.43338135436148884 0.3941755316007135 0.6837632020558377 1.1978681642169442 2.018128231492122 1.1524666188726589 1.3151082416628408 0.8211015672542623 1.5463376753507014 1.7358262424709383 1.544438078415576 1.5159484531651943 1.574482737512243 1.947991334518815 1.2742822462780847 1.7469132483409506 
DEBUG @ Mon, 14 Oct 2024 16:07:59: Median factor: 3.0527497173776807 1.5242708534353389 0.6468475415015553 0.40699281810206 0.809509835365955 0.7281082881694585 0.36797994688584507 0.33624250508021486 0.582555022377743 1.0223511505537806 1.7258722732525489 0.9846275491760113 1.1164294624720132 0.7027042494738289 1.3205284061215297 1.4846412002347134 1.3217715484016164 1.3033681340674552 1.3460162501605921 1.6616955951544006 1.0946697575596858 1.5002416212364889 
INFO  @ Mon, 14 Oct 2024 16:07:59: Final size factor: 3.0527497173776807 1.5242708534353389 0.6468475415015553 0.40699281810206 0.809509835365955 0.7281082881694585 0.36797994688584507 0.33624250508021486 0.582555022377743 1.0223511505537806 1.7258722732525489 0.9846275491760113 1.1164294624720132 0.7027042494738289 1.3205284061215297 1.4846412002347134 1.3217715484016164 1.3033681340674552 1.3460162501605921 1.6616955951544006 1.0946697575596858 1.5002416212364889 

the rest is just reading the files, calculating data fro trimming and gini indices. I can upload the complete log if needed.

ADD REPLY
0
Entering edit mode

Do you have the count command log from vispr as well?

ADD REPLY
0
Entering edit mode

sorry, I thought it was loaded. here it is.

INFO  @ Tue, 15 Oct 2024 07:31:09: Parameters: /fs/home/yeroslaviz/miniconda3/envs/mageck/bin/mageck count --output-prefix results/count/all --norm-method control --list-seq /fs/pool/pool-cox-projects-bioinformatics/AG_Murray/Monika/P842/mageck/mageck_NonHuman_Trimmed.txt --fastq ../../rawdata/ctrl_t0_1_R2.fastq.gz ../../rawdata/ctrl_t0_2_R2.fastq.gz ../../rawdata/ctrl_t16_1_R2.fastq.gz ../../rawdata/ctrl_t16_2_R2.fastq.gz ../../rawdata/ctrl_t16_3_R2.fastq.gz ../../rawdata/ctrl_t16_4_R2.fastq.gz --sample-label ctrl_t0_1,ctrl_t0_2,ctrl_t16_1,ctrl_t16_2,ctrl_t16_3,ctrl_t16_4 --count-pair False --control-sgrna /fs/pool/pool-cox-projects-bioinformatics/AG_Murray/Monika/P842/mageck/CTRL_sgRNA.txt --trim-5 AUTO 
INFO  @ Tue, 15 Oct 2024 07:31:09: Welcome to MAGeCK v0.5.9.5. Command: count 
INFO  @ Tue, 15 Oct 2024 07:31:09: Header line of the library file detected; skip the first line ... 
INFO  @ Tue, 15 Oct 2024 07:31:09: Loading 1868 predefined sgRNAs. 
WARNING @ Tue, 15 Oct 2024 07:31:09: There are 0 sgRNAs with duplicated sequences. 
INFO  @ Tue, 15 Oct 2024 07:31:09: Parsing FASTQ file ../../rawdata/ctrl_t0_1_R2.fastq.gz... 
INFO  @ Tue, 15 Oct 2024 07:31:09: Determining the trim-5 length of FASTQ file ../../rawdata/ctrl_t0_1_R2.fastq.gz... 
INFO  @ Tue, 15 Oct 2024 07:31:09: Possible gRNA lengths:20 
INFO  @ Tue, 15 Oct 2024 07:31:09: Processing 0M reads ... 
INFO  @ Tue, 15 Oct 2024 07:31:11: Read length:74 
INFO  @ Tue, 15 Oct 2024 07:31:11: Total tested reads: 100001, mapped: 95972(0.9597104028959711) 
INFO  @ Tue, 15 Oct 2024 07:31:11: --trim-5 test data: (trim_length reads fraction) 
INFO  @ Tue, 15 Oct 2024 07:31:11: 46   12967   0.13511232442795815 
...

The log file is unfortunately too lomng to post here completely, but you can download the complete file here

thanks for looking into it.

ADD REPLY
0
Entering edit mode

This is complete mageck mle log file, though it was used only to compare specific samples within the complete subset.

INFO  @ Thu, 17 Oct 2024 07:45:15: Parameters: mageck mle -k mageck_count_raw/P842_rawCounts.count.txt --norm-method control --control-sgrna CTRL_sgRNA.txt -d desgin_mat_ctrlt0-16.txt -n mageck_mle_raw_ctrlt8_vs_ctrl-t0_sgRNA 
INFO  @ Thu, 17 Oct 2024 07:45:16: Cannot parse design matrix as a string; try to parse it as a file name ... 
INFO  @ Thu, 17 Oct 2024 07:45:16: Design matrix: 
INFO  @ Thu, 17 Oct 2024 07:45:16: [[1. 0. 0.] 
INFO  @ Thu, 17 Oct 2024 07:45:16:  [1. 0. 0.] 
INFO  @ Thu, 17 Oct 2024 07:45:16:  [1. 0. 1.] 
INFO  @ Thu, 17 Oct 2024 07:45:16:  [1. 0. 1.] 
INFO  @ Thu, 17 Oct 2024 07:45:16:  [1. 0. 1.] 
INFO  @ Thu, 17 Oct 2024 07:45:16:  [1. 0. 1.]] 
INFO  @ Thu, 17 Oct 2024 07:45:16: Beta labels:baseline,ctrl_t0,ctrl_t16 
INFO  @ Thu, 17 Oct 2024 07:45:16: Included samples:ctrl_t0_1,ctrl_t0_2,ctrl_t16_1,ctrl_t16_2,ctrl_t16_3,ctrl_t16_4 
INFO  @ Thu, 17 Oct 2024 07:45:16: Loaded samples:ctrl_t0_1;ctrl_t0_2;ctrl_t16_1;ctrl_t16_2;ctrl_t16_3;ctrl_t16_4 
INFO  @ Thu, 17 Oct 2024 07:45:16: Sample index: 0;1;2;3;4;5 
INFO  @ Thu, 17 Oct 2024 07:45:16: Loaded 593 genes. 
INFO  @ Thu, 17 Oct 2024 07:45:16: 168 out of 168 control sgRNAs are found in count table. 
INFO  @ Thu, 17 Oct 2024 07:45:16: Loaded 168 control sgRNAs from CTRL_sgRNA.txt 
INFO  @ Thu, 17 Oct 2024 07:45:16: control sgRNAs for normalization:168 
DEBUG @ Thu, 17 Oct 2024 07:45:16: Initial (total) size factor: 3.934784874704868 1.8871408102897287 0.8187793522337624 0.5165726113426938 1.0297429115316012 0.919395137544251 
DEBUG @ Thu, 17 Oct 2024 07:45:16: Median factor: 3.253323892655041 1.5813512621704844 0.6806890737058581 0.42498067456187144 0.8531850224927363 0.7618719235685456 
INFO  @ Thu, 17 Oct 2024 07:45:16: Final size factor: 3.253323892655041 1.5813512621704844 0.6806890737058581 0.42498067456187144 0.8531850224927363 0.7618719235685456 
INFO  @ Thu, 17 Oct 2024 07:45:16: size factor: 0.30737794114434114,0.6323705706140517,1.469099532560036,2.3530481733809134,1.1720787093499565,1.3125565716033767 
INFO  @ Thu, 17 Oct 2024 07:45:16: Thread 0 started. 
INFO  @ Thu, 17 Oct 2024 07:45:16: Thread 0: total 1 instances. 
INFO  @ Thu, 17 Oct 2024 07:45:16: Thread 0 completed. 
INFO  @ Thu, 17 Oct 2024 07:45:16: All threads completed. 
INFO  @ Thu, 17 Oct 2024 07:45:16: Modeling the mean and variance ... 
INFO  @ Thu, 17 Oct 2024 07:45:16: Run the algorithm for the second time ... 
INFO  @ Thu, 17 Oct 2024 07:45:16: Thread 0 started. 
INFO  @ Thu, 17 Oct 2024 07:45:16: Thread 0: total 593 instances. 
INFO  @ Thu, 17 Oct 2024 07:45:16: Thread 0: Calculating SLC22A7 (1) ...  
INFO  @ Thu, 17 Oct 2024 07:45:19: Thread 0 completed. 
INFO  @ Thu, 17 Oct 2024 07:45:19: All threads completed. 
INFO  @ Thu, 17 Oct 2024 07:45:19: Using 168 genes and 168 sgRNAs as negative controls for permutation... 
INFO  @ Thu, 17 Oct 2024 07:45:19: Permuting groups of gene with 1 sgRNAs per gene. Group progress: 1/2 
INFO  @ Thu, 17 Oct 2024 07:45:19: Start permuting 2 rounds ... 
INFO  @ Thu, 17 Oct 2024 07:45:19: Collecting 168 sgRNAs from 168 genes. 
INFO  @ Thu, 17 Oct 2024 07:45:20: Permuting round 0 ... 
INFO  @ Thu, 17 Oct 2024 07:45:20: Thread 0 started. 
INFO  @ Thu, 17 Oct 2024 07:45:20: Thread 0: total 593 instances. 
INFO  @ Thu, 17 Oct 2024 07:45:20: Thread 0: Calculating SLC22A7 (1) ...  
INFO  @ Thu, 17 Oct 2024 07:45:22: Thread 0 completed. 
INFO  @ Thu, 17 Oct 2024 07:45:22: All threads completed. 
INFO  @ Thu, 17 Oct 2024 07:45:22: Permuting round 1 ... 
INFO  @ Thu, 17 Oct 2024 07:45:22: Thread 0 started. 
INFO  @ Thu, 17 Oct 2024 07:45:22: Thread 0: total 593 instances. 
INFO  @ Thu, 17 Oct 2024 07:45:23: Thread 0: Calculating SLC22A7 (1) ...  
INFO  @ Thu, 17 Oct 2024 07:45:25: Thread 0 completed. 
INFO  @ Thu, 17 Oct 2024 07:45:25: All threads completed. 
INFO  @ Thu, 17 Oct 2024 07:45:25: Assigning p values... 
INFO  @ Thu, 17 Oct 2024 07:45:25: Permuting groups of gene with 4 sgRNAs per gene. Group progress: 2/2 
INFO  @ Thu, 17 Oct 2024 07:45:25: Start permuting 2 rounds ... 
INFO  @ Thu, 17 Oct 2024 07:45:25: Collecting 168 sgRNAs from 168 genes. 
INFO  @ Thu, 17 Oct 2024 07:45:25: Permuting round 0 ... 
INFO  @ Thu, 17 Oct 2024 07:45:25: Thread 0 started. 
INFO  @ Thu, 17 Oct 2024 07:45:25: Thread 0: total 593 instances. 
INFO  @ Thu, 17 Oct 2024 07:45:25: Thread 0: Calculating SLC22A7 (1) ...  
INFO  @ Thu, 17 Oct 2024 07:45:29: Thread 0 completed. 
INFO  @ Thu, 17 Oct 2024 07:45:29: All threads completed. 
INFO  @ Thu, 17 Oct 2024 07:45:29: Permuting round 1 ... 
INFO  @ Thu, 17 Oct 2024 07:45:30: Thread 0 started. 
INFO  @ Thu, 17 Oct 2024 07:45:30: Thread 0: total 593 instances. 
INFO  @ Thu, 17 Oct 2024 07:45:30: Thread 0: Calculating SLC22A7 (1) ...  
INFO  @ Thu, 17 Oct 2024 07:45:33: Thread 0 completed. 
INFO  @ Thu, 17 Oct 2024 07:45:33: All threads completed. 
INFO  @ Thu, 17 Oct 2024 07:45:34: Assigning p values... 
INFO  @ Thu, 17 Oct 2024 07:45:34: Writing gene results to mageck_mle_raw_ctrlt8_vs_ctrl-t0_sgRNA.gene_summary.txt 
INFO  @ Thu, 17 Oct 2024 07:45:34: Writing sgRNA results to mageck_mle_raw_ctrlt8_vs_ctrl-t0_sgRNA.sgrna_summary.txt 
ADD REPLY
1
Entering edit mode
5 weeks ago

This probably stems from differences in the gene variance modeling (which is set to --genes-var 0 in the VISPR run. It's 1000 by default in a typical MLE run.

See the docs:

--genes-varmodeling GENES_VARMODELING The number of genes for mean-variance modeling. Default 1000.

As for why that's the case, I am not sure. Perhaps it can be controlled at the top level of the VISPR command.

ADD COMMENT
0
Entering edit mode

But the mean-variance modelling is based on the normalized data (actually raw data corrected by size factors), so that comes after, no?

ADD REPLY
0
Entering edit mode

Presumably, but I haven't dug through the source code to verify. OP, can you post the full logs from your vispr call? And the commands you used for the count command fed into mle? It'd be nice to see if there are differences in the count command used by each. Your config file for vispr might also be helpful.

ADD REPLY
0
Entering edit mode

thanks for that. I thought it has something to do with the modelling, but wanted to make sure. I'll post the log files here separately for convenience

ADD REPLY

Login before adding your answer.

Traffic: 1826 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6