Trimming reads based on quality (Phred) scores
0
0
Entering edit mode
6.3 years ago
c.clarido ▴ 110

Here I have a few reads with their corresponding quality scores. To my understanding, I translated the quality scores from this fastq file with ord() in python and changed the numbers to 0 and 1 on the condition that if the ord(char) <= 53 then it's 1, otherwise it's 1. From this method I got all 0's, so does that means that each read does not require any trimming? This is however just a test data. I have a lot bigger fastq file, but what if from this big file I got something like: 111111001101111000000000000.... etc. Is there any rule of condition I should follow when to trim the ends of a read?

(PS: It's a project from school that we need to understand how the trimming works before using an existing tool)

@HWI-EAS384_0000:2:1:1444:905#0/1
NTGTAAAGTTCGATGAGTATTTGCTTTATGGGAGAAATATCCAGCGTTTAGAAAATGTAATTTCAAGGTTACAAC
+HWI-EAS384_0000:2:1:1444:905#0/1
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@HWI-EAS384_0000:2:1:1629:903#0/1
NCAACACTTTCTGAATATGCCTTCAAAACGTGTATCATGTTGATAAATGCAATATTCCATTTCCCAACAGTGACT
+HWI-EAS384_0000:2:1:1629:903#0/1
BGGKOIJIKJ[YY[Y__________BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@HWI-EAS384_0000:2:1:1838:908#0/1
NATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATATCGTATGCCGTCTTCTGCTTGAAAAAAAAAAACAAG
+HWI-EAS384_0000:2:1:1838:908#0/1
BKKQKNQNNLWWXWWYYYYYYYYYYXXXXX[[[[[VVVNVTTWRRYYYYY_____BBBBBBBBBBBBBBBBBBBB
@HWI-EAS384_0000:2:1:2067:910#0/1
NGAAATTTACAAAGAAGAACACGTAATATATTCATAAACGGGGAATTTTCATCAATGGAGACAAAAAATGTCGAC
+HWI-EAS384_0000:2:1:2067:910#0/1
BIIEENNJJN____YIJLKOQQTTNQWNTN_____YYY[Y____W[[Y[[___W_BBBBBBBBBBBBBBBBBBBB
@HWI-EAS384_0000:2:1:2279:904#0/1
NAATCGTTCTGTTAAATCAATATTCATAAAAGGCACAAATTCATTATCGTTAATTTTTGAACTATGAAGTAATAC
+HWI-EAS384_0000:2:1:2279:904#0/1
BJJNNWWTQT_____WWWWRVTWVWY[YTYOOVVVQQNNQ_____NOROOLIJJQ____Y___W_YWYYYVPVTT
@HWI-EAS384_0000:2:1:2329:907#0/1
NCAGACAGTTCCTTATTTCTGTTCGACTGACTGAAAATTGACTTTTCTACTAGATTTTTCTAATACTTAACTTTG
+HWI-EAS384_0000:2:1:2329:907#0/1
BKHOGJINQLYYYYYYYQQY_____TVVVVXXXRVIJNLK_____YYQQYTPTMT[Y[[[QQ______Y______
@HWI-EAS384_0000:2:1:2464:909#0/1
NTTTAGCCTGGCCCATGGTTCCCAAAAAGCAATACAAAGCTTGGGTCAACTCCAGCCCAGGGTGACCAGAACCCC
+HWI-EAS384_0000:2:1:2464:909#0/1
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@HWI-EAS384_0000:2:1:2603:919#0/1
NTCGTTGCACCATTGCTTTTTGAAAAAGAATGAGTCGACTTTACGAGTTCAATTTAAAGCACAAATTTTTGCACA
+HWI-EAS384_0000:2:1:2603:919#0/1
BRRRRVVWTV_V_____________WVWQQQ________Y_____PVVVWIKQKJXRVXX___V_[[[[[_____
@HWI-EAS384_0000:2:1:2755:912#0/1
NCGAGGGGAAAGGATAAGAAACTTGATCTCACGCCGGAGAAAATAGCAGCCCAGGCTTTTGTCATCTATTTCGGT
+HWI-EAS384_0000:2:1:2755:912#0/1
BQLLNROMJP_____YY[[[QQ___BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
reads trimming phredscore qualityscore • 3.0k views
ADD COMMENT
1
Entering edit mode

An approach that you can use is a sliding window: if in a sliding window of N (say 5) nucleotides the average quality drops below a cutoff M then you trim the read. This prevents 'internal' trimming when just one base has a lower quality. That is also an option in Trimmomatic.

ADD REPLY
0
Entering edit mode

why not use an existing tool on your original fastq files?

ADD REPLY
0
Entering edit mode

It's a project from school that we need to understand how the trimming works before using an existing tool

ADD REPLY
0
Entering edit mode

I don't get the point to switch from phred quality to your 0 or 1 quality score. If you want to trim your sequences you can use dedicated tools as fastp

ADD REPLY
0
Entering edit mode

It's a project from school that we need to understand how the trimming works before using an existing tool

ADD REPLY
0
Entering edit mode

There can be a few different possibilities of how these scores are encoded depending on how old the data is. Are you using a simple rule that as soon as you encounter a 0 you are going to trim the rest of the read until the end or are you going to use something more sophisticated like a sliding window average?

ADD REPLY

Login before adding your answer.

Traffic: 2000 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6