Question

Problem with mapping rate while using Mageck to process CRISPR screen data

0

Entering edit mode

1 day ago

Luwell • 0

Hi! I'm a beginner in bioinformatics analysis.I want to analyze the RAWDATA from a CRISPR screen paired-end sequencing experiment.

The library structure used by the company is: 5' adapter1-Index2(i5)-primer1-insert fragment-primer2-Index1(i7)-adapter1'.

I used fastp and cutadapt to remove the known adapters and primers in R1 and R2, and obtained CLEANDATA. The reads are around 150bp in length, and most of the sgRNA sequences start from the 43rd position in R1, with a length of 20bp.

After that, I processed the CLEANDATA using MAGeCK with the code:

mageck count -l library.csv -n countA --fastq L1_R1.fq.gz --fastq-2 L1_R2.fq.gz

mageck can automatically detect the position of the sgRNA and trim-5. enter image description here

My problem is that the highest mapped for CLEANDATA (6 samples) is only around 70%, and I m unable to improve this result. If I use cutadapt to trim the sequences on both sides of the sgRNA in the 150bp reads, it might lower the mapped even further. Additionally, if I count directly using the RAWDATA , the mapped percentage is around 69%.

enter image description here

I can t find any answer of this question.

Could this be a problem with the data provided by the company, or am I missing a crucial step in my processing? How can I improve it?

Thanks.

crispr screen CRISPR-screen mageck mapping • 137 views

ADD COMMENT • link 1 day ago by Luwell • 0

0

Entering edit mode

Have you tried to see if the following helps

a) use just the R1 file
b) hard trim the reads so that sgRNA starts within a few bases at start with just R1 file?

R2 file is not going to add any additional information.

ADD REPLY • link 1 day ago by GenoMax 146k

0

Entering edit mode

The results using only R1 are also around 70%.And I tried to trim about 40bp from R1, as most sgRNAs are located in this region, to make the sgRNA start within the 2nd, 3rd, or 4th base. However, the results didn't change much, and in fact, the more I trimmed, the slightly lower the mapped became.

I guess it might be because there are very few sgRNAs located earlier, and trimming them has resulted in not being able to detect them. So, I don't know what else I can do.

ADD REPLY • link 1 day ago by Luwell • 0

0

Entering edit mode

MAGeCK also tosses reads with any mismatches from your known sequences, so you could consider aligning yourself and feeding in a count matrix as mentioned in their tutorial, which may bump the numbers a bit.

From experience, we typically get 70-80% mapping reads via mageck count and the downstream analyses are fine.

ADD REPLY • link 1 day ago by jared.andrews07 ★ 17k