Afternoon all.
MAGeCK pipeline works for me as long as I don't supply a control list. But when I do it tells me it cant find any of my non-targeting guides. Here is my code to run:
mageck count --output-prefix results/count/all --norm-method control --list-seq LibraryFixed.csv --fastq ../folder/1.fastq.gz ../folder/2.fastq.gz ../folder/3.fastq.gz ../folder/4.fastq.gz ../folder/5.fastq.gz ../folder/6.fastq.gz --sample-label 1,2,3,4,5,6 --count-pair False --control-sgrna Controls.txt
and this is the error I get:
0 out of 100 control sgRNAs are found in count table.
Not enough control sgRNAs found in the count table. Please check your control sgRNA list.
My Control.txt is just a text file with the "ID" of the control on each line that matches the id name from the Library.csv , like this:
Non-TargetingControl17
Non-TargetingControl31
Non-TargetingControl34
Non-TargetingControl51
when I look at the counts from the MLE and RRA that works without the control file, I do see the non-targeting controls there, so the error that the list cant be found in the counts table might help me track down the error but doesnt let me fix it... apparently in the old version there was a known bug that could read in the library ids wrong if you had a control file. which then make sense they wouldn't match. but according to the dev log, that was fixed.
running MAGeCK 0.5.9.5 (newest as far as I can tell)
any help is much appreciated or even ideas on how I can trouble shoot how to fix the problem. Thank you.
If you
grep
for these sgRNA names inLibraryFixed.csv
, what is the output?Thank you ATpoint for the reply and coming to my aid. The output from:
is:
and
grep -E 'Non-TargetingControl' LibraryFixed.csv
returns all 100 of them , I dont see anything super wrong on first look. no spaces anywhere that shouldnt be. unless the hypen is messing it up? I could do a find replace and swap them to underscores you think?I'd try that, yeah. I don't quite remember my issue, but I ran into something similar with the guide names getting changed somehow, there were certain characters it didn't like. It may have been hyphens.
I removed the hyphens completely from the Library file and the Control file... same error =(
Alternatively, try to skip the normalization in the counting and do it in the run step. I used to count using a custom strategy and then use run, and in run the control option worked well for me.
Thanks to both of you for trying to help. Much appreciated.
I started researching this idea this morning thanks to ATpoint comment. while
run
command has been disabled since 0.5.4 (apparently). it looks like thetest
command will also accept the control guides as a replacement. Putting this here more for anyone who googles this error and finds this thread, not so much for ATpoint or Jared.trying
count
now then going totest
after with the controls , will update here if it works or crashes.EDIT: running
mle
but its essentially liketest
only notRRA
Ah sorry, I meant to say
test
, notrun
. The one that runs the RRA testing.all good. Okay so same error BUT now a little more info which may or may not be helpful
so I ran:
and while I got the same error of
0 out of 100 control sgRNAs are found in count table
i also got some more information from the log files now including this one line:Loaded 263 genes.
which , with controls , is the right number. I double checked using R as a sanity checkand sure enough I get
263
I think we can be fairly sure now that when the counts file is being read, the gene names are being read from the correct column, else the combining of the multiple guides per gene wouldnt be counted correctly as no other column has the same duplicated numbers.
when you add in the fact that when you leave out the controls that the
gene_summary.txt
has all the gene names and controls the same as the library & count file the only explanation for the error is that the Controls.txt is not being read correctly, although it is obviously being read.We know it properly sees the new line character because it sees 100 controls, the correct number. I have tried 3 different encodings, UTF-8 , Western, and UTF-16 ... 8 & Western gave same error... 16 could not be read at all... so I dont think its an encoding issue.
Is it possible they changed the format of how the Control.txt file should look on the inside and just didn't update the documentation? maybe I need to add quotes around each one or something?