Removal of non quality read in fastq file
2
0
Entering edit mode
6.5 years ago

I am trying to remove the non quality reads in fastq,

For Example

@NB501309:173:HYW77BGX5:1:11101:23920:1057 1:N:0:CATTTTAT+GGGGGGGG

TCTCANGGAGAGTTCGATCCTGGCTCAGGATGAACGCTGGCGGCATGCTTAACACATGCAAGTCGAACGGGAAGT

+

AAAAA#EEEEEEEEAEEEAEEEEEEEEEEEEEEEEEEAEEEA<EEE/AE<EEEEEEEE6EEEEEEEEEEAEEEEE

**@NB501309:173:HYW77BGX5:1:11101:19977:1057 1:N:0:CATTTTAT+GGGGGGGG

CCCGTNGTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACCATTTTATCTCGTATGCCGTCTTCTGCTTGAAAAA

+**

@NB501309:173:HYW77BGX5:1:11101:16270:1057 1:N:0:CATTTTAT+GGGGGGGG

ATTCTNGGGTGCCAAGGAACTCCAGTCACCATTTTATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAGGGGGG

+

AAAAA#EEEEEEEEEEEEEEEEEEEEEEE/EEEEEE///AE//EE/EA/A/A//E//A<//EEEAEE///A/EEE

**@NB501309:173:HYW77BGX5:1:11101:15947:1058 1:N:0:CATTTTAT+GGGGGGGG

CCCGTNGTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACCATTTTATCTCGTATGCCGTCTTCTGCTTGAAAAA

+**

I want to remove @NB501309:173:HYW77BGX5:1:11101:15947:1058 1:N:0:CATTTTAT+GGGGGGGG and @NB501309:173:HYW77BGX5:1:11101:15947:1058 1:N:0:CATTTTAT+GGGGGGGG from the fastq file.

Kindly suggest me how to perform.

Thanks in advance

fastq quality • 2.1k views
ADD COMMENT
0
Entering edit mode
http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fastq_quality_filter_usage

Almost all fastq trimming tools support filtering by score.

ADD REPLY
0
Entering edit mode

thank you, but i want to remove the corresponding read from the fastq file only which is not having quality scores.

ADD REPLY
0
Entering edit mode

What did you do to have fastq file with some reads without quality ?

If your file is not that big you can use Biopython :

  • Open your file in python script
  • Put the content of your file in a SeqIO object
  • Loop over SeqIO object (records)
  • For each record in records
    • If you have quality
      • Write record in new file
ADD REPLY
0
Entering edit mode

While pooling, the reads quality has missed out, now i have to remove those reads. Kinldy help

ADD REPLY
1
Entering edit mode

I think that you should retry your pooling. There is no reason you loose reads because your pooling missed. If quality was missing in the raw fastq file, you can try to remove these reads from raw fastq file

ADD REPLY
0
Entering edit mode

I repooled it, again the same problem, i think i have missed in the raw file only, Kindly help me how to remove those reads.

ADD REPLY
0
Entering edit mode

You should try this tool to validate your fastq first. Investigate your fastq raw files is the best way to not loose information.

In any way if you want to remove these reads, my script below should do the trick

ADD REPLY
0
Entering edit mode
6.5 years ago
record=[]

new_file = open('new_no_qual.fastq', 'a')

with open("no_qual.fastq") as f:
    for line in f:
        if line.startswith("@"):
            if len(record) == 4:
                new_file.write('\n'.join(record)+"\n")
            record=[]
            record.append(line.rstrip())
        else:
            record.append(line.rstrip())
    if len(record) == 4:
        new_file.write('\n'.join(record))

new_file.close()

Tell me if it's too slow, I'll think about it

ADD COMMENT

Login before adding your answer.

Traffic: 1536 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6