Question

RNA Seq Paired End High Duplication Rate

0

Entering edit mode

5 months ago

Tara • 0

This is from FASTQC analysis of paired end data. R1 has twice as many reads as R2. I'm not sure why, but my FASTQC duplication looks like this for R1. Does anyone know why?

enter image description here

fastqc RNA-seq FASTQ • 617 views

ADD COMMENT • link updated 5 months ago by GenoMax 147k • written 5 months ago by Tara • 0

0

Entering edit mode

This blog post by authors of FastQC would be of interest: https://sequencing.qcfail.com/articles/libraries-can-contain-technical-duplication/

Since this is RNAseq data some duplication is expected since there are likely to be more copies of same RNA.

R1 has twice as many reads as R2. I

You could try running repair.sh tool from BBMap suite. Guide here: https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/repair-guide/

ADD REPLY • link 5 months ago by GenoMax 147k

0

Entering edit mode

Thank you! repair.sh worked great!

ADD REPLY • link 5 months ago by Tara • 0

score 0 · Answer 1 · 2024-05-31

0

Entering edit mode

5 months ago

swbarnes2 14k

R1 has twice as many reads as R2

That's very wrong. Alert the people who made the fastqs, that is not right at all.

It might be that something went wrong with the fastq generation. Like someone somehow generated only R1, and realized their mistake, and then regenerated both files, but the new R1 was appended to the existing one, instead of overwriting it.

Grab the first read name of the fastq, and see if it turns up twice. If it does, tell whoever made the fastq to start from scratch.

ADD COMMENT • link 5 months ago by swbarnes2 14k

0

Entering edit mode

Thank you so much for your quick response. Do you happen to know the Linux command to seen the first read and search for it in the file?

ADD REPLY • link 5 months ago by Tara • 0

0

Entering edit mode

The command to check the first line would be

zcat my_file.fastq.gz | head -n 1

Then when you get the first line:

grep readname my_file.fastq.gz

Ans see how many lines it returns.

You are going to have to poke around to find a command to clean up the fastq if that is the problem; I'm not sure how to do it off the top of my head.

ADD REPLY • link 5 months ago by swbarnes2 14k

0

Entering edit mode

Also this data was generated years ago and I may not be able to contact the person that originally created the fastq file. Is there anyway for me to fix this error myself? I am not a bioinformatician. I only dabble in it to help out my lab. Thanks again for your help.

ADD REPLY • link 5 months ago by Tara • 0