Entering edit mode
4.4 years ago
Swimming bird
▴
20
Hi! I want to analyse data from a CRISPR/Cas9 screen (control vs. treatment) and I'm using Mageck (https://sourceforge.net/projects/mageck/). The sequencing was performed using Illumina (paired-end).
The problem is that I've noticed that in R1 fastq files the half of the reads containing the sgRNA are R2 (in R2 files there are R1 reads as well). Should I consider these reads in the sgRNA count?.
How did that happen? Always best to go back and get original data files in cases where you are in doubt.
I don't know because the sequencing was commissioned but I think it could be due to the ligation: https://seekdeep.brown.edu/illumina_paired_info.html
Are you saying that you have short inserts (these being sgRNA) so R1/R2 are likely to overlap (i.e. there is no mixing per se)? You could just use R1 read or look into a tool like
bbmerge.sh
from BBTools that can merge R1/R2 reads. You can then trim adapters and then count the consensus sequence produced (or align to a reference and then count).No, there is a mixing because R1 files contain R2 reads.
If they are truly mixed and you want to separate the reads (and if they have standard Illumina headers then you can do something like):