Paired-end sequencing with inconsistent quality/number of reads
1
0
Entering edit mode
8 weeks ago
anissa • 0

Hi everyone,

I am currently dealing with a dataset from paired-end sequencing (Illumina NextSeq1k2k, 2 lanes, 101 cycles per read). After using nf-core/rnaseq pipeline with STAR/salmon to quantify the gene counts, I get a multiQC report which shows that each file has very different number of reads. I can't see a pattern on both ends (see histogram of STAR alignment scores per sample) I tried browsing this on google and here but I couldn't find any similar post.

I suspect there was a problem during the sequencing, but I don't know what exactly (I am not an expert in NGS, I have a general knowledge of sequencing). Could you explain me what could be the origin of the problem ? Is it possible to get rid of the end that is poor quality to have a 'single-end-like' data of better quality ?

Thanks in advance for your help.

PS: any resource to help me understand all of the steps of sequencing RNA would be useful!

paired-end NGS STAR flowcell alignment • 485 views
ADD COMMENT
1
Entering edit mode

This is a question for the people doing the benchwork. You would have to look at the QC to see how they normalized the libraries. There is nothing you can do at your end other than drop the lowest samples, and maybe downsample the highest ones (but that might not even be necessary)

Running the exact same library multiple times does not cause batch artifacts, so you could ask the people who load the instrument to run these again, using the read counts of the fastqs to rebalance them.

ADD REPLY
1
Entering edit mode
8 weeks ago
GenoMax 148k

I suspect there was a problem during the sequencing

No there was no obvious "problem". This is a result of how the pool got made from individual libraries. These libraries were of different concentration (had varying amounts of material) and that is why you ended up with different number of reads post-demultiplexing based on what was in the pool.

There are ways to make a balanced pool to get equivalent read numbers for all samples in a pool (by doing qPCR on libraries and/or by running a miseq nano run to get actual number of reads from the pool so it can then be adjusted by adding more amounts of certain libraries to balance things out). These additional steps require effort and are generally charged for extra by sequencing centers.

While the differential expression analysis programs will try and account for such differences in numbers if the order is large, then you may need to take that into account when doing data analysis.

There are plenty of resources to understand how RNA is sequenced. Here is a random video on the topic: YouTube LINK

ADD COMMENT
0
Entering edit mode

Hi, thank you for your answers. Indeed there is a high variability of RNA quantity across samples and unfortunately I cannot anything to correct it apart from excluding samples with very low number of reads. I also confused this with another plot of the multiQC report which made me think there was a technical issue during sequencing that I am not aware of. Some samples are so low quality that I doubt that amplifying their libraries will make any difference. Thanks you for the useful comments!

ADD REPLY

Login before adding your answer.

Traffic: 1995 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6