RNA Seq Paired End High Duplication Rate
1
0
Entering edit mode
5 months ago
Tara • 0

This is from FASTQC analysis of paired end data. R1 has twice as many reads as R2. I'm not sure why, but my FASTQC duplication looks like this for R1. Does anyone know why?

enter image description here

fastqc RNA-seq FASTQ • 620 views
ADD COMMENT
0
Entering edit mode

This blog post by authors of FastQC would be of interest: https://sequencing.qcfail.com/articles/libraries-can-contain-technical-duplication/

Since this is RNAseq data some duplication is expected since there are likely to be more copies of same RNA.

R1 has twice as many reads as R2. I

You could try running repair.sh tool from BBMap suite. Guide here: https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/repair-guide/

ADD REPLY
0
Entering edit mode

Thank you! repair.sh worked great!

ADD REPLY
0
Entering edit mode
5 months ago

R1 has twice as many reads as R2

That's very wrong. Alert the people who made the fastqs, that is not right at all.

It might be that something went wrong with the fastq generation. Like someone somehow generated only R1, and realized their mistake, and then regenerated both files, but the new R1 was appended to the existing one, instead of overwriting it.

Grab the first read name of the fastq, and see if it turns up twice. If it does, tell whoever made the fastq to start from scratch.

ADD COMMENT
0
Entering edit mode

Thank you so much for your quick response. Do you happen to know the Linux command to seen the first read and search for it in the file?

ADD REPLY
0
Entering edit mode

The command to check the first line would be

zcat my_file.fastq.gz | head -n 1

Then when you get the first line:

grep readname my_file.fastq.gz

Ans see how many lines it returns.

You are going to have to poke around to find a command to clean up the fastq if that is the problem; I'm not sure how to do it off the top of my head.

ADD REPLY
0
Entering edit mode

Also this data was generated years ago and I may not be able to contact the person that originally created the fastq file. Is there anyway for me to fix this error myself? I am not a bioinformatician. I only dabble in it to help out my lab. Thanks again for your help.

ADD REPLY

Login before adding your answer.

Traffic: 1758 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6