I have a file say master.fastq which looks like :
@M00990:202:000000000-ADM27:1:1101:21678:1536 2:N:0:291 CCTTTTACCGACCCGCTCTTTCTCTCCTACGCTTATTTCCGTCTACCCTTCTCTTCACTCGCTATTTCTATTCTTAAAACTATCTTAATGTTCTGCCTTTGCTCTTTTCTTTTTTCTATAACCTCTCTACAGCCAACTCACCCATCTCCTTCCCTGCTACGCTATTCCTCTGTTAGTTTTTCTTCATCATACTTTTCTCATCTCACACTACCTTTGCACTTCTTCCTTTCCACGTCCCCTTTCTCCTACC + -----,,6;6,@+8++6+,,,<C5@,,,,,,+8,,;,,<,,,,,,,;,,;,,,<C,,,;,,8,,8CC,C,,,5,C,,,,99,+,+4,,,3,,9,,6,@<,,,,,,,9,,,,,,,4),,0**,,5),,5**,)59*0),,*)5,)))*,9,3,0++))5D:)))))5;+;+0)*;+*6++++),******3**50,6,+++**,0*,,31)88*0*1*5*1)0*:*7>C;3,035:0)))8).*2**.*:) @M00990:202:000000000-ADM27:1:1101:22685:1539 2:N:0:291 CTTATCACCGACTCTCTCCTTCTCTTCCAAGTTTATTTCCGACTCCCCTTATCTTCACTTGCTATTTCTATTCTTAAAACTATCTCGACCTTTCACCTTTCCCTCTTTCCTTCTTTTCTCTCCTTCTACACTCCCACCCACTCTTACTTCTTTCTTGTCACCGTTTCCATATTATACTTTCTTCTCTTACATAATTTTCTTCCTGCAAACTATTTAAGCAATCTCTTTCTTTCACCCCTTTTATCTCGCC + -----,,<;67@+B,,6,,,,;C5@,,,,,66<,,6,,<,,,66,,;,66,,;;C,,,;,4<,,<CC6C,;,,,;,,6,:,6?A9=,,+++2,5,,,9<?D,,,,,:C?,@,@,,5,?*3*,9**,,0**,*)93))4+))19*0**,,52,56+**5*03*)3)))42+2***5*+=3+,*4*2****,**2*,3,+++*0,50,,**5*****0****)5)***0**,,***3*)0)))3***0*)))
I want only reads that have the sequence "AAGTTGATAACGGACTAGCCTTATTTT" in them. I tried grep but lose the fastq format. Can you suggest how I can retain the fastq format in the output, thanks
Hi! I got from fastqc report on overrepresented sequences, I copied one of these sequences (GACTACTAGGGTATCTAATCCTGTTTGCTCCCCACGCTTTCGAGCCTCAA) and tried to find it in fastq file using:
there is no result. Why?
grep
can not directly work on a gzip-compressed file.You could use
zgrep
instead of plaingrep
, if it is available. It will work with compressed files.Yes, it worked. So simple.. Thank you!