How to remove reads in a fastq file that DO NOT have adapter sequence in them?
2
0
Entering edit mode
7.3 years ago
Javad ▴ 150

Dear all,

I have a fastq formatted file and I want to remove reads that DO NOT have a specific sequence in them. The experiment is designed in a way that only reads that have an adapter in them are meaningful for us and the rest of the reads are trash and I have to find a way to get rid of them. Another layer of complication is that I have paired-end reads. I would be grateful if someone can make a suggestion about how to handle this.

Best regards

RNA-Seq next-gen • 3.0k views
ADD COMMENT
0
Entering edit mode

You can look into tools that splits the fastq file based on barcode and check if you can adapt them.

For example, quiime or stacks

If you know the exact sequence, you can do it with fastq-grep but it does not allow mismatches.

And more easy way given below as an answer.

ADD REPLY
0
Entering edit mode

Grep out the forward and reverse adapter sequence you are looking for and one line above and two lines below. Then append the two files together

grep -A 1 -B 2 "adapter sequence" original.fastq > wanted_adapters.fastq grep -A 1 -B 2 "reverse_adapter sequence" original.fastq >> wanted_adapters.fastq

ADD REPLY
0
Entering edit mode
$ cat test.fq 
@SRR001666 071112_SLXA-EAS1_s_7:5:1:817:345 length=72
GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACCAAGTTACCCTTAACAACTTAAGGGTTTTCAAATAGA
+SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=72
IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9ICIIIIIIIIIIIIIIIIIIIIDIIIIIII>IIIIII/
@SRR001666 071112_SLXA-EAS1_s_7:5:1:801:338 length=72
GTTCAGGGATACGACGTTTGTATTTTAAGAATCTGAAGCAGAAGTCGATGATAATACGCGTCGTTTTATCAT
+SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=72
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII6IBIIIIIIIIIIIIIIIIIIIIIIIGII>IIIII-I)8I

using seqkit:

$ seqkit grep -vdisp   GGGTGA test.fq 
@SRR001666 071112_SLXA-EAS1_s_7:5:1:801:338 length=72
GTTCAGGGATACGACGTTTGTATTTTAAGAATCTGAAGCAGAAGTCGATGATAATACGCGTCGTTTTATCAT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII6IBIIIIIIIIIIIIIIIIIIIIIIIGII>IIIII-I)8I

v = inverse match, i=ignore case, s=sequence, p=pattern, d=degenerate

ADD REPLY
2
Entering edit mode
7.3 years ago
Asaf 10k

Using cutadapt using the flag --discard-untrimmed, works with paired-end

ADD COMMENT
0
Entering edit mode
7.3 years ago
GenoMax 147k

This answer shows you how to keep reads that have the adapter. Untested but please try and let us know. bbduk.sh from BBMap suite used like this: bbduk.sh in1=file_R1.fq.gz in2=file_R2.fq.gz outm=matching_R1.fq.gz outm2=matching_R2.fq.gz literal=your_adapter_sequence_goes_here k=a_number_equal_to_length_of_your_adapter mm=f

Note: Replace real adapter sequence and a number equal to the length of the adapter in two places above.

ADD COMMENT

Login before adding your answer.

Traffic: 1742 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6