Entering edit mode
3.3 years ago
dlekrud456
•
0
Hello,
I have lists of sequence which I would like to find fastq reads that contain these sequences.
Is there a tool or any possible programming to find fastq reads from specific lists of sequences??
My lists of sequences look like following,
GATAAAAAAAAAAAAAAAC
GATAAAAAAAAAAAAAACC
GATAAAAAAAAAAAAAATC
GATAAAAAAAAAAAAAAGC
GATAAAAAAAAAAAAACAC
GATAAAAAAAAAAAAACCC
GATAAAAAAAAAAAAACTC
GATAAAAAAAAAAAAATAC
GATAAAAAAAAAAAAATCC
GATAAAAAAAAAAAAATGC
GATAAAAAAAAAAAAAGAC
GATAAAAAAAAAAAAAGCC
GATAAAAAAAAAAAAAGGC
GATAAAAAAAAAAAACAAC
GATAAAAAAAAAAAACACC
GATAAAAAAAAAAAACCAC
GATAAAAAAAAAAAACCCC
GATAAAAAAAAAAAACCTC
GATAAAAAAAAAAAATAAC
GATAAAAAAAAAAAATCAC
GATAAAAAAAAAAAATTAC
GATAAAAAAAAAAAAGAAC
GATAAAAAAAAAAAAGACC
GATAAAAAAAAAAACAAAC
GATAAAAAAAAAAACCCCC
GATAAAAAAAAAAATAAAC
GATAAAAAAAAAAAGAAAC
GATAAAAAAAAAACAAAAC
. . . .
I have used grep
to do this one by one but it's taking too long (I have 40k 19mers).
grep -A 2 -B 1 "CTCAAAAAAAAACAAAGGA" input.fastq |grep -v "^\-\-$" > output.fastq
Also, there is a problem with overlapping reads.
You can use
grep -f file
So if you have a file with single pattern per line grep will pull out all sequences for all patterns.
thanks a lot!! I tried the code below and it's not working, could you please have a look??
grep -A 2 -B 1 -f list.txt input.fastq |grep -v "^\-\-$" > output.fastq
Looks right
the output fastqs seems to be empty :(