Unequal paired-end fastq size after quality control
2
0
Entering edit mode
8.2 years ago
Whoknows ▴ 960

Hi friends

I have used Trimmomatic for checking quality of my RNA-SEQ paired-end files. I have got an odd output, the final result showed different size for fastq file=> L1= 9275244535 and L2= 9238052265

Why this happnen?

I used this code :

java -jar trimmomatic-0.36.jar PE L1.fq.gz L2.fq.gz paired_L1.fastq unpaired_L1.fastq paired_L2.fastq unpaired_L2.fastq LEADING:20 TRAILING:20 MINLEN:140

I did not trim first bases, but first 12 bases showes unbalanced in fastq file and also duplicatation on first 12 bases region.

RNA-Seq fastq Trimmomatic • 5.0k views
ADD COMMENT
1
Entering edit mode

Could you clarify what f1 and f2 are?

ADD REPLY
0
Entering edit mode

f1 or L1 size in byte, f2 or L2 size in byte. I have updated that.

ADD REPLY
1
Entering edit mode

I don't see the relevance of the size in bytes, number of lines would be more informative (wc -l yourfile.fastq)

That said, it's very well possible that one read of a pair didn't 'survive' the trimming and the read became 'unpaired'. Edit: which should then end up in different files, thanks to @mastal511 for pointing this out

ADD REPLY
0
Entering edit mode

The lines number for both files were same, both files have 102449352 lines.

ADD REPLY
1
Entering edit mode

Sounds like nothing to worry about then :p

ADD REPLY
2
Entering edit mode
8.2 years ago
reza ▴ 300

size of both files is same but their sequencing quality is different and after trimming them, size of them will not same because trimmed line and bases will be different.

ADD COMMENT
0
Entering edit mode

Thanks Reza, your are right. It removes low quality paired reads but still remains reads even with different number of bases in each side.

ADD REPLY
1
Entering edit mode
8.2 years ago
mastal511 ★ 2.1k

If one read of a pair doesn't survive the trimming, trimmomatic will put the surviving mate in one of the unpaired.fastq files. So the two paired.fastq files should have the same number of lines and the same number of reads, but after trimming, not necessarily the same number of bases. The strange per base sequence content you see at the 5' ends is quite common for RNA-Seq data, and is due to the random priming step in the library prep not being quite so random.

ADD COMMENT
0
Entering edit mode

So, you mean trimmomatic may allow to have both read pair after trimming but with different number of bases on each sides, right?

ADD REPLY

Login before adding your answer.

Traffic: 1860 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6