Hi there, I recently performed MiSeq 80 bp paired-end run to measure CRISPR sgRNA coverage from plasmid. I got two files back: R1 and R2. However, number of reads in R1 (66331572) is much more higher than R2 (566768). I used
cat R1.fastq | wc -l
What does this mean? Thanks
First thing to do is to make sure you downloaded the entire file (e.g. the file is not partially downloaded / corrupted). Check this by checking the md5 checksum of the files -- it should match the md5 sum the company provides you. Also, you can see if the size of the file that the company gives you is the same as the size of your downloaded file.
Also, run
cat R2.fastq|tail
and that may be able to tell you whether it was a partial download (e.g. the last line might be truncated). If it was, then redownload the files.If you confirm that your files are fully intact, then I'll see what others have to suggest about why the number of lines are different between R1 and R2.
This is not right. If you've verified that the files have been correctly downloaded (does the provider provide a MD5?), then you need to go back to your sequencing provider and query this.
The base call is performed independently, some sequencers remove low-quality reads without removing the respective pair, so you will end with singletons.
Is this some artifact of enabling trimming in bcl2fastq?
yes, it could be a cause of the filtering