Question

Trimmomatic generated two (reverse-forward) paired-files with different number of reads

2

Entering edit mode

21 months ago

Pegasus ▴ 120

Hi all,

Through the RNA-seq analysis workflow using Linux,

Trimmomatic generates 4 out-put files; forward-paired.fq.gz, reverse-paired.fq.gz, and the 2 unpaired files.

As I read in several threads, Trimomatic is expected to;

Remove the adapters and the low-quality reads.
generates 2 paired-fq.gz files with the same read number.

FastQC reports for both paired.files (for each sample) indicate good quality, but I received different read counts for the two (paired) files. Downstream analysis might be affected, especially feature counts (This issue is likely behind the error; Paired-end reads are included, and The reads are assigned on the single-end mode, when I run featureCounts).

For example;

sample1-forward.paired-fq.gz (wc -l ) = 4457636

sample2-reverse.paired.fq.gz (wc -l) = 4580400

I used the command line below as explained in Trimmomaic manual;

java -jar $EBROOTTRIMMOMATIC/trimmomatic-0.39.jar PE samplef1.fastq.gz sampler1.fastq.gz samplef1.paired.fq.gz samplef1_unpaired.fq.gz sampler1_paired.fq.gz sampler1_unpaired.fq.gz ILLUMINACLIP:NexteraPE-PE.fa:2:30:10:2:True LEADING:3 TRAILING:3 MINLEN:36

So, what could be the extra reads? could you please suggest any solution to fix this issue?

Thank you in advance.

Trimmomatic RNA-Seq • 1.4k views

ADD COMMENT • link 21 months ago by Pegasus ▴ 120

score 2 · Answer 1 · 2023-03-08

2

Entering edit mode

21 months ago

ATpoint 86k

Just to be sure, did you decompress the files before wc-ing them?

ADD COMMENT • link 21 months ago by ATpoint 86k

0

Entering edit mode

Thank you so much ATpoint, after decompressing the files, they turned out to be the correct match.

So if the read counts for both files are correct, what do you suggest the reason behind the error; Paired-end reads are included, and The reads are assigned on the single-end mode when I run featureCounts (subread). I used HISAT2 for mapping, samtools -n for sorting the converted bam files by name.

ADD REPLY • link 21 months ago by Pegasus ▴ 120

0

Entering edit mode

what error? and what software produced the error and what is the command line that you used for running is?

ADD REPLY • link 21 months ago by jv ★ 1.8k

0

Entering edit mode

Please find below the details of the question you asked; thanks JV;

RNA-SEQ reads were mapped to a reference genome using HISAT2
Sam files were converted to bam and sorted (by name) using samtools sort -n function
When I run featurCounts using the bam-sorted.bam files as input, featurCounts (subread) using the command line;

featureCounts -p -B -C -t gene -g gene_id -a referenc-genome.gtf -o count.txt treatemnt1.sorted.bam treatemnt2.sorted.bam treatemnt3.sorted.bam treatemnt4.sorted.bam treatment5.sorted.bam control1.sorted.bam control2.sorted.bam control3.sorted.bam control4.sorted.bam control5.sorted.bam

featureCounts (Subread) showed the message below;

Paired-end reads are included, and The reads are assigned on the single-end mode

Is it a common message or this indicates that featureCounts is treating the paired-end reads as if they were single-end reads?
should not sorting bam files by name be more accurate as input for futureCounts?.Sorting by position/ by name resulted > same output (feature.counts.table).

Thank you

ADD REPLY • link 21 months ago by Pegasus ▴ 120

1

Entering edit mode

With featureCounts -p option you now also need to include the following.

-p                  Specify that input data contain paired-end reads. To
                      perform fragment counting (ie. counting read pairs), the
                      '--countReadPairs' parameter should also be specified in
                      addition to this parameter.

  --countReadPairs    Count read pairs (fragments) instead of reads. This option
                      is only applicable for paired-end reads.

ADD REPLY • link 21 months ago by GenoMax 148k

0

Entering edit mode

Thank you so much @GenoMax,

I am dealing with bacterial RNA-seq data, should I include the flag;

(-O) assigns reads to all their overlapping meta-features, as recommended in some discussions?

When I included it in featureCounts, it increased the % of the successfully assigned alignments.

ADD REPLY • link 21 months ago by Pegasus ▴ 120