I encountered a problem while using Mageck to analysis CRISPR screen data
1
0
Entering edit mode
5 weeks ago
Pallondyle • 0

The error is represented as:

INFO  @ Sun, 04 Aug 2024 06:38:54: Total: 61970171.
INFO  @ Sun, 04 Aug 2024 06:38:54: Mapped: 38138765.
INFO  @ Sun, 04 Aug 2024 06:38:54: Parsing FASTQ file rawdata/CRISPR_screen_1/MOI001-A549-cas9_R2.fq.gz...
INFO  @ Sun, 04 Aug 2024 06:38:54: Determining the trim-5 length of FASTQ file rawdata/CRISPR_screen_1/MOI001-A549-cas9_R2.fq.gz...
INFO  @ Sun, 04 Aug 2024 06:38:54: Possible gRNA lengths:20
INFO  @ Sun, 04 Aug 2024 06:38:54: Processing 0M reads ...
INFO  @ Sun, 04 Aug 2024 06:38:58: Read length:150
INFO  @ Sun, 04 Aug 2024 06:38:58: Total tested reads: 100001, mapped: 50(0.0004999950000499995)
ERROR @ Sun, 04 Aug 2024 06:38:58: Cannot automatically determine the --trim-5 length. Only 0.049999500004999954 % of the reads can be identified.
Mageck CRISPR-screen • 392 views
ADD COMMENT
0
Entering edit mode

Where is the gRNA supposed to be in your reads? In first 50 bp? You may want to trim your original reads downs to that size and try. Are you providing a file with the expected gDNA sequences as reference?

ADD REPLY
0
Entering edit mode

This is an example of my reference data:

ID,sgRNA Target Sequence,Target Gene Symbol
1-NM_130786.3-sense,CATCTTCTTTCACCTGAACG,A1BG
2-NM_130786.3-antisense,CTCCGGGGAGAACTCCGGCG,A1BG
3-NM_130786.3-antisense,TCTCCATGGTGCATCAGCAC,A1BG
4-NM_130786.3-antisense,TGGAAGTCCACTCCACTCAG,A1BG
5-NM_000014.4-sense,ACTGCATCTGTGCAAACGGG,A2M
6-NM_000014.4-sense,ATGTCTCATGAACTACCCTG,A2M
7-NM_000014.4-sense,TGAAATGAAACTTCACACTG,A2M
8-NM_000014.4-sense,TTACTCATATAGGATCCCAA,A2M
9-NM_000662.7-antisense,CGGAAGACACAAGGCACCTG,NAT1
10-NM_000662.7-sense,GAACCTTAACATCCATTGTG,NAT1
11-NM_000662.7-sense,GTTGTGAGAAGAAATCGGGG,NAT1
12-NM_000662.7-sense,TTGGACGCTCATACCAGATG,NAT1
13-NM_000015.2-sense,ATTGTAAGAAGAAACCGGGG,NAT2
14-NM_000015.2-sense,TAACAAATACAGCACTGGCA,NAT2
15-NM_000015.2-antisense,TAGAGGCTGCCACATCTGGG,NAT2
16-NM_000015.2-antisense,TGTGGTCTGAAAACCGATTG,NAT2

I generate this data using the content file. This is an example of my fq data:

@LH00380:75:223CMKLT4:8:1101:32641:1028 1:N:0:TNGACAAT
GNAGACCCTTGTGGAAAGGACGAAACACCGCAGGCACCATAAGGATGAAGGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTTAAGCTTGGCGTAACTAGA
+
I#IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@LH00380:75:223CMKLT4:8:1101:33223:1028 1:N:0:TNGACAAT
GNTTGTGGAAAGGACGAAACACCGTTCTGGACAGTGCAGGGGAAAGAATAGTAGAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTTGACAATATCTAGGGGGGGGCGTTTTGTTTTTGAGGGGGGGGGGGGGGGGGGGGGGGGGG
+
9#IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII9--I-I9I--9--------9---9---99I999999I-9I99999999999

I'm not sure that where the gRNA supposed to be in my reads.

ADD REPLY
0
Entering edit mode

In CRISPR and shRNA screens you don't sequence the guide but attached barcodes. Unless this is not super custom it's the first bases of every read but how many exactly you need to determine by either checking protocols or talk to the scientist who produced the library.

ADD REPLY
0
Entering edit mode

Easiest way to find what location they are in the reads is to take a few of those sequences and grep for them in your fastq file.

ADD REPLY
0
Entering edit mode
5 weeks ago
Pallondyle • 0

Thanks, everyone! I have found out the reason why I encountered this problem. I removed my R2 sequencing data, and it works! It turns out that the gRNA is located in different positions in R1 and R2 files.

ADD COMMENT

Login before adding your answer.

Traffic: 1385 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6