MAGeCK pipeline doesnt find my controls. What am I doing wrong?
1
0
Entering edit mode
2.8 years ago
msn ▴ 130

Afternoon all.

MAGeCK pipeline works for me as long as I don't supply a control list. But when I do it tells me it cant find any of my non-targeting guides. Here is my code to run:

mageck count --output-prefix results/count/all --norm-method control --list-seq LibraryFixed.csv --fastq ../folder/1.fastq.gz ../folder/2.fastq.gz ../folder/3.fastq.gz ../folder/4.fastq.gz ../folder/5.fastq.gz ../folder/6.fastq.gz  --sample-label 1,2,3,4,5,6  --count-pair False --control-sgrna Controls.txt

and this is the error I get:

0 out of 100 control sgRNAs are found in count table. 
Not enough control sgRNAs found in the count table. Please check your control sgRNA list. 

My Control.txt is just a text file with the "ID" of the control on each line that matches the id name from the Library.csv , like this:

Non-TargetingControl17
Non-TargetingControl31
Non-TargetingControl34
Non-TargetingControl51

when I look at the counts from the MLE and RRA that works without the control file, I do see the non-targeting controls there, so the error that the list cant be found in the counts table might help me track down the error but doesnt let me fix it... apparently in the old version there was a known bug that could read in the library ids wrong if you had a control file. which then make sense they wouldn't match. but according to the dev log, that was fixed.

running MAGeCK 0.5.9.5 (newest as far as I can tell)

any help is much appreciated or even ideas on how I can trouble shoot how to fix the problem. Thank you.

knockout screen Python CRISPR MAGeCK R • 2.7k views
ADD COMMENT
1
Entering edit mode

If you grep for these sgRNA names in LibraryFixed.csv, what is the output?

ADD REPLY
0
Entering edit mode

Thank you ATpoint for the reply and coming to my aid. The output from:

grep -E 'Non-TargetingControl92' LibraryFixed.csv

is:

810,ACGTGGGGACATATACGTGT,Non-TargetingControl92

and grep -E 'Non-TargetingControl' LibraryFixed.csv returns all 100 of them , I dont see anything super wrong on first look. no spaces anywhere that shouldnt be. unless the hypen is messing it up? I could do a find replace and swap them to underscores you think?

ADD REPLY
0
Entering edit mode

I'd try that, yeah. I don't quite remember my issue, but I ran into something similar with the guide names getting changed somehow, there were certain characters it didn't like. It may have been hyphens.

ADD REPLY
0
Entering edit mode

I removed the hyphens completely from the Library file and the Control file... same error =(

ADD REPLY
0
Entering edit mode

Alternatively, try to skip the normalization in the counting and do it in the run step. I used to count using a custom strategy and then use run, and in run the control option worked well for me.

ADD REPLY
0
Entering edit mode

Thanks to both of you for trying to help. Much appreciated.

I started researching this idea this morning thanks to ATpoint comment. while run command has been disabled since 0.5.4 (apparently). it looks like the test command will also accept the control guides as a replacement. Putting this here more for anyone who googles this error and finds this thread, not so much for ATpoint or Jared.

trying count now then going to test after with the controls , will update here if it works or crashes.

EDIT: running mle but its essentially like test only not RRA

ADD REPLY
0
Entering edit mode

Ah sorry, I meant to say test, not run. The one that runs the RRA testing.

ADD REPLY
1
Entering edit mode

all good. Okay so same error BUT now a little more info which may or may not be helpful

so I ran:

mageck mle -k results/all.count.txt -d designmatrix_mle_all.csv --control-sgrna Controls.txt  --norm-method control --threads 25

and while I got the same error of 0 out of 100 control sgRNAs are found in count table i also got some more information from the log files now including this one line: Loaded 263 genes. which , with controls , is the right number. I double checked using R as a sanity check

library <- read.csv("LibraryFixed.csv", header = FALSE)
listGenes <- library$V3
length(listGenes[!duplicated(listGenes)])

and sure enough I get 263

I think we can be fairly sure now that when the counts file is being read, the gene names are being read from the correct column, else the combining of the multiple guides per gene wouldnt be counted correctly as no other column has the same duplicated numbers.

when you add in the fact that when you leave out the controls that the gene_summary.txt has all the gene names and controls the same as the library & count file the only explanation for the error is that the Controls.txt is not being read correctly, although it is obviously being read.

We know it properly sees the new line character because it sees 100 controls, the correct number. I have tried 3 different encodings, UTF-8 , Western, and UTF-16 ... 8 & Western gave same error... 16 could not be read at all... so I dont think its an encoding issue.

Is it possible they changed the format of how the Control.txt file should look on the inside and just didn't update the documentation? maybe I need to add quotes around each one or something?

ADD REPLY
0
Entering edit mode
2.8 years ago
msn ▴ 130

maybe this pipeline is becoming defunct. its super tricky with some of these pipelines because once post-docs / grad-students leave , version skew of dependencies and updates to wet-lab technology become impossible to keep up with unless you hire someone to maintain it. But how on earth would you fund something like that.

I have been toying with the idea of forking it and writing a fix and push it , but truthfully what happens when I move on to something else next month, and the code breaks for some other reason and then there is no one around to fix my code?

ADD COMMENT

Login before adding your answer.

Traffic: 2676 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6