Hi,
I would like to generate count table from my crispr screen fastq files. I have 248 genes which have their unique sgRNA sequence. I know that the total sequence should be " known sequence 1 + sgRNA sequence + known sequence 2"
Those known sequence 1 and 2 are in the array sequence column so I know their content and length. In order to generate the count table I have to trim the 5' and 3' sequences and count sgRNA sequence based on the library that I have.
Mageck takes 3 inputs for the trimming. 5' trim length, sgrna sequence library(csv) file) and adaptor sequence(says its optional). I thought that known sequence 1 should be trimmed with 5' trimming, sgRNA will be taken care of with sgrna sequence library and adaptor will be the known sequence 2. ( This could be problematic)
When I use mageck for trimming and mapping, my mapping rate appears to be very low ~8%. I think this caused by the lengths that I give to the trimming process. I checked individual reads in the fastq file and I saw that total sequence does exists in the reads conserved, but they have additional sequences on their 5' and 3', so when I enter a sequence length for trimming, it might fell short and considers the read as unmapped.
Could you guide me to find out where is the problem or how can I solve this issue?
Thank you very much,
Best,
Tunc.
If you know
sequence 1
andsequence 2
then perhaps the better option would be to use BBDuk.sh from BBMap suite. You can provide the two sequences in a file (or asliteral=
option) that way you don't need to be dependent on length based trimming. You can also specify which side the sequences should be trimmed on. A comprehensive thread describing BBDuk is available.