I would like to remove all those reads from my fastq file which has adapter sequence in it.
Which tool or software or any unix command line options should be good for removing the reads
PS: i don't want to trim the adapters, want to remove that reads which have adapter seq from fastq file.
Assuming all of your reads are of the same length, you can use any of the existing read trimmers that allow a minimum read length option (e.g. trim_galore with the --length option). Then, just have the program reject any trimmed reads, since they'll be shorter than whatever the initial read length was. For example, if you have 100bp reads then running
trim_galore -a adapter --length 100 file.fastq
or something like that should do what you want. This has the benefit of being able to handle paired-end reads (presuming you want to filter out both of the pairs).
Have you tried the fastX toolkit. There is a function fastx_clipper which can be used to remove the adapter sequences. Here it is
$ fastx_clipper -h
usage: fastx_clipper [-h] [-a ADAPTER] [-D] [-l N] [-n] [-d N] [-c] [-C] [-o] [-v] [-z] [-i INFILE] [-o OUTFILE]
version 0.0.6
[-h] = This helpful help screen.
[-a ADAPTER] = ADAPTER string. default is CCTTAAGG (dummy adapter).
[-l N] = discard sequences shorter than N nucleotides. default is 5.
[-d N] = Keep the adapter and N bases after it.
(using '-d 0' is the same as not using '-d' at all. which is the default).
[-c] = Discard non-clipped sequences (i.e. - keep only sequences which contained the adapter).
[-C] = Discard clipped sequences (i.e. - keep only sequences which did not contained the adapter).
[-k] = Report Adapter-Only sequences.
[-n] = keep sequences with unknown (N) nucleotides. default is to discard such sequences.
[-v] = Verbose - report number of sequences.
If [-o] is specified, report will be printed to STDOUT.
If [-o] is not specified (and output goes to STDOUT),
report will be printed to STDERR.
[-z] = Compress output with GZIP.
[-D] = DEBUG output.
[-i INFILE] = FASTA/Q input file. default is STDIN.
[-o OUTFILE] = FASTA/Q output file. default is STDOUT.
Hi This would also trim reads which don't have adapter sequence but have poor quality at the ends. I dont want to trim those reads. How to go about it
Varun
Only if you want it to. You can set whatever quality trimming threshold you want. Try it with -q 0