Question

Should I use unpaired reads from trimmomatic

0

Entering edit mode

13 months ago

dxj294 • 0

I need some advice for the new task I have to process.

I am new to Bioinformatics and I have to perform QC of the bulk RNAseq data. I have successfully ran Fastqc and multiqc and after seeing the results, I had to trim certain lengths.

The output gave me _1 paired fastq files and _2 paired as well as unpaired fastq files.

Fastqc and Multiqc scores for teh paired files looks good, and I wanna align them to the mouse ref genome. I feel like I should only use the paired reads but I need a proper reason if its correct or why I am wrong here.

Thanks

RNA-seq QC Trimmomatic • 1.4k views

ADD COMMENT • link updated 13 months ago by swbarnes2 15k • written 13 months ago by dxj294 • 0

0

Entering edit mode

Most aligners can't use paired and unpaired reads at the same time. So you may need to align the unpaired reads independently before merging alignments. BBMap the aligner can align unpaired and paired data at the same time.

How many unpaired reads do you have (as a % of total). It may be fine to simply skip the unpaired data, if the % is small.

ADD REPLY • link 13 months ago by GenoMax 151k

0

Entering edit mode

I see, I have 186 paired and 93 unpaired. All the unpaired were from _2 fastq files.

As of now I have started aligning only 186 paired, but should i align unpaired separately after this task?

All the unpaired files are less than 1gb and all the paired reads are greater than 1.5gb, some going as much as 10gb each.

ADD REPLY • link 13 months ago by dxj294 • 0

1

Entering edit mode

If the unpaired are _2, files, the _1 must exist somewhere. But the simple answer is probably going to be to just use all the _2 only. You won't get that much better counts by including the pairs where you have them, and then you save yourself the headache of dealing with batch effect from having some samples processed one way, and other processed another way. (Though I guess you might still have a batch between the sets for some separate reason)

ADD REPLY • link 13 months ago by swbarnes2 15k

score 1 · Answer 1 · 2024-04-23

If you have lots of data, both paired and unpaired, then the best strategy is probably to treat them separately and merge the counts at the end.

So align the paired and unpaired reads separately, count them separately then sum up the counts into a final count matrix for each sample.

Here it might matter what methods you use, but in general it would be better to treat the data separately to avoid miscounting later. For example, in paired data, we count read pairs, but in single-end reads, we count reads.