Question

I encountered a problem while using Mageck to analysis CRISPR screen data

0

Entering edit mode

9 months ago

Pallondyle • 0

The error is represented as:

INFO  @ Sun, 04 Aug 2024 06:38:54: Total: 61970171.
INFO  @ Sun, 04 Aug 2024 06:38:54: Mapped: 38138765.
INFO  @ Sun, 04 Aug 2024 06:38:54: Parsing FASTQ file rawdata/CRISPR_screen_1/MOI001-A549-cas9_R2.fq.gz...
INFO  @ Sun, 04 Aug 2024 06:38:54: Determining the trim-5 length of FASTQ file rawdata/CRISPR_screen_1/MOI001-A549-cas9_R2.fq.gz...
INFO  @ Sun, 04 Aug 2024 06:38:54: Possible gRNA lengths:20
INFO  @ Sun, 04 Aug 2024 06:38:54: Processing 0M reads ...
INFO  @ Sun, 04 Aug 2024 06:38:58: Read length:150
INFO  @ Sun, 04 Aug 2024 06:38:58: Total tested reads: 100001, mapped: 50(0.0004999950000499995)
ERROR @ Sun, 04 Aug 2024 06:38:58: Cannot automatically determine the --trim-5 length. Only 0.049999500004999954 % of the reads can be identified.

Mageck CRISPR-screen • 1.2k views

ADD COMMENT • link updated 5 months ago by GenoMax 151k • written 9 months ago by Pallondyle • 0

0

Entering edit mode

Where is the gRNA supposed to be in your reads? In first 50 bp? You may want to trim your original reads downs to that size and try. Are you providing a file with the expected gDNA sequences as reference?

ADD REPLY • link 9 months ago by GenoMax 151k

0

Entering edit mode

This is an example of my reference data:

ID,sgRNA Target Sequence,Target Gene Symbol
1-NM_130786.3-sense,CATCTTCTTTCACCTGAACG,A1BG
2-NM_130786.3-antisense,CTCCGGGGAGAACTCCGGCG,A1BG
3-NM_130786.3-antisense,TCTCCATGGTGCATCAGCAC,A1BG
4-NM_130786.3-antisense,TGGAAGTCCACTCCACTCAG,A1BG
5-NM_000014.4-sense,ACTGCATCTGTGCAAACGGG,A2M
6-NM_000014.4-sense,ATGTCTCATGAACTACCCTG,A2M
7-NM_000014.4-sense,TGAAATGAAACTTCACACTG,A2M
8-NM_000014.4-sense,TTACTCATATAGGATCCCAA,A2M
9-NM_000662.7-antisense,CGGAAGACACAAGGCACCTG,NAT1
10-NM_000662.7-sense,GAACCTTAACATCCATTGTG,NAT1
11-NM_000662.7-sense,GTTGTGAGAAGAAATCGGGG,NAT1
12-NM_000662.7-sense,TTGGACGCTCATACCAGATG,NAT1
13-NM_000015.2-sense,ATTGTAAGAAGAAACCGGGG,NAT2
14-NM_000015.2-sense,TAACAAATACAGCACTGGCA,NAT2
15-NM_000015.2-antisense,TAGAGGCTGCCACATCTGGG,NAT2
16-NM_000015.2-antisense,TGTGGTCTGAAAACCGATTG,NAT2

I generate this data using the content file. This is an example of my fq data:

@LH00380:75:223CMKLT4:8:1101:32641:1028 1:N:0:TNGACAAT
GNAGACCCTTGTGGAAAGGACGAAACACCGCAGGCACCATAAGGATGAAGGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTTAAGCTTGGCGTAACTAGA
+
I#IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@LH00380:75:223CMKLT4:8:1101:33223:1028 1:N:0:TNGACAAT
GNTTGTGGAAAGGACGAAACACCGTTCTGGACAGTGCAGGGGAAAGAATAGTAGAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTTGACAATATCTAGGGGGGGGCGTTTTGTTTTTGAGGGGGGGGGGGGGGGGGGGGGGGGGG
+
9#IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII9--I-I9I--9--------9---9---99I999999I-9I99999999999

I'm not sure that where the gRNA supposed to be in my reads.

ADD REPLY • link 9 months ago by Pallondyle • 0

0

Entering edit mode

In CRISPR and shRNA screens you don't sequence the guide but attached barcodes. Unless this is not super custom it's the first bases of every read but how many exactly you need to determine by either checking protocols or talk to the scientist who produced the library.

ADD REPLY • link 9 months ago by ATpoint 88k

0

Entering edit mode

Easiest way to find what location they are in the reads is to take a few of those sequences and grep for them in your fastq file.

ADD REPLY • link 9 months ago by dsull ★ 7.4k

score 0 · Answer 1 · 2024-08-05

0

Entering edit mode

9 months ago

Pallondyle • 0

Thanks, everyone! I have found out the reason why I encountered this problem. I removed my R2 sequencing data, and it works! It turns out that the gRNA is located in different positions in R1 and R2 files.

ADD COMMENT • link 9 months ago by Pallondyle • 0

0

Entering edit mode

Hi Pallondyle,

How did you remove the R2 files? I’m also encountering a low mapping percentage, and I suspect it’s because the gRNA is located in different positions in the R1 and R2 files. In my single raw data, I have both R1 and R2 sequences. Should I remove the R2 sequences and then proceed with the analysis, or do I need to extract the R2 reads, reverse complement them, and then merge the R1 and R2 files before analyzing?

Thanks!

ADD REPLY • link 5 months ago by Han • 0

1

Entering edit mode

Unless you have a reason to reverse complement the R2 reads (seen a protocol that uses ways so that the barcodes are sequenced in RC orientation so they appear at beginning of R2 read) you can simply omit them from input.

ADD REPLY • link 5 months ago by GenoMax 151k