Question

Difference in R1 and R2 FASTQ files from RNA seq paired end

0

Entering edit mode

5.1 years ago

jaynishshah • 0

Hi there, I recently performed MiSeq 80 bp paired-end run to measure CRISPR sgRNA coverage from plasmid. I got two files back: R1 and R2. However, number of reads in R1 (66331572) is much more higher than R2 (566768). I used

cat R1.fastq | wc -l

What does this mean? Thanks

rna-seq RNA-Seq • 6.5k views

ADD COMMENT • link 5.0 years ago by jaynishshah • 0

2

Entering edit mode

First thing to do is to make sure you downloaded the entire file (e.g. the file is not partially downloaded / corrupted). Check this by checking the md5 checksum of the files -- it should match the md5 sum the company provides you. Also, you can see if the size of the file that the company gives you is the same as the size of your downloaded file.

Also, run cat R2.fastq|tail and that may be able to tell you whether it was a partial download (e.g. the last line might be truncated). If it was, then redownload the files.

If you confirm that your files are fully intact, then I'll see what others have to suggest about why the number of lines are different between R1 and R2.

ADD REPLY • link 5.1 years ago by dsull ★ 6.9k

1

Entering edit mode

This is not right. If you've verified that the files have been correctly downloaded (does the provider provide a MD5?), then you need to go back to your sequencing provider and query this.

ADD REPLY • link 5.1 years ago by i.sudbery 20k

0

Entering edit mode

The base call is performed independently, some sequencers remove low-quality reads without removing the respective pair, so you will end with singletons.

ADD REPLY • link 5.1 years ago by JC 13k

0

Entering edit mode

Is this some artifact of enabling trimming in bcl2fastq?

ADD REPLY • link 5.1 years ago by Devon Ryan 104k

0

Entering edit mode

yes, it could be a cause of the filtering

ADD REPLY • link 5.1 years ago by JC 13k

score 1 · Answer 1 · 2019-11-05

1

Entering edit mode

5.1 years ago

colindaven 7.0k

You might be able to repair this using repair.sh from the bbmap package (install via bioconda if you like).

By repair I mean get R1 and R2 files of the same length with the corresponding reads and headers.

eg

repair.sh -Xmx40g in=A1_03_S2_2_R1.fastq in2=A1_03_S2_2_R2.fastq out1=A1_03_S2_2b_R1.fastq out2=A1_03_S2_2b_R2.fastq outs=singletons1.fq overwrite=true

ADD COMMENT • link 5.1 years ago by colindaven 7.0k

1

Entering edit mode

I would not try to /trust repair when 99% of the data is missing.

ADD REPLY • link 5.1 years ago by Kristoffer Vitting-Seerup ★ 4.1k

score 0 · Answer 2 · 2019-11-11

0

Entering edit mode

5.0 years ago

jaynishshah • 0

Thanks very much everyone for response. The sequencing facility confirmed that it was single-end reads, so R2 should be empty/irrelevant.

ADD COMMENT • link 5.0 years ago by jaynishshah • 0

1

Entering edit mode

I think that facility desperately needs to review their procedures and you should probably think twice before using them again (if that's an option) If the data is single end, why on Earth would they deliver an R2. Also, if you request paired-end (inferring from your post), why do you get single end... lots of red flags.

[edit] wrote "red lights", meant "red flags" :S

ADD REPLY • link 5.0 years ago by cschu181 ★ 2.8k