Hi everyone,
I am currently dealing with a dataset from paired-end sequencing (Illumina NextSeq1k2k, 2 lanes, 101 cycles per read). After using nf-core/rnaseq pipeline with STAR/salmon to quantify the gene counts, I get a multiQC report which shows that each file has very different number of reads. I can't see a pattern on both ends I tried browsing this on google and here but I couldn't find any similar post.
I suspect there was a problem during the sequencing, but I don't know what exactly (I am not an expert in NGS, I have a general knowledge of sequencing). Could you explain me what could be the origin of the problem ? Is it possible to get rid of the end that is poor quality to have a 'single-end-like' data of better quality ?
Thanks in advance for your help.
PS: any resource to help me understand all of the steps of sequencing RNA would be useful!
This is a question for the people doing the benchwork. You would have to look at the QC to see how they normalized the libraries. There is nothing you can do at your end other than drop the lowest samples, and maybe downsample the highest ones (but that might not even be necessary)
Running the exact same library multiple times does not cause batch artifacts, so you could ask the people who load the instrument to run these again, using the read counts of the fastqs to rebalance them.