CRISPR/Cas9 screen analysis: reads R1 and R2 mixed
0
0
Entering edit mode
4.4 years ago

Hi! I want to analyse data from a CRISPR/Cas9 screen (control vs. treatment) and I'm using Mageck (https://sourceforge.net/projects/mageck/). The sequencing was performed using Illumina (paired-end).

The problem is that I've noticed that in R1 fastq files the half of the reads containing the sgRNA are R2 (in R2 files there are R1 reads as well). Should I consider these reads in the sgRNA count?.

crispr screen reads illumina sequencing • 1.8k views
ADD COMMENT
1
Entering edit mode

in R1 fastq files the half of the reads containing the sgRNA are R2 (in R2 files there are R1 reads as well)

How did that happen? Always best to go back and get original data files in cases where you are in doubt.

ADD REPLY
0
Entering edit mode

I don't know because the sequencing was commissioned but I think it could be due to the ligation: https://seekdeep.brown.edu/illumina_paired_info.html

ADD REPLY
1
Entering edit mode

Are you saying that you have short inserts (these being sgRNA) so R1/R2 are likely to overlap (i.e. there is no mixing per se)? You could just use R1 read or look into a tool like bbmerge.sh from BBTools that can merge R1/R2 reads. You can then trim adapters and then count the consensus sequence produced (or align to a reference and then count).

ADD REPLY
0
Entering edit mode

No, there is a mixing because R1 files contain R2 reads.

ADD REPLY
1
Entering edit mode

If they are truly mixed and you want to separate the reads (and if they have standard Illumina headers then you can do something like):

grep -A 3 "1:N:0" original.fq > R1.fq
grep -A 3 "2:N:0" original.fq > R2.fq
ADD REPLY

Login before adding your answer.

Traffic: 2206 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6