Entering edit mode
8.8 years ago
kanwarjag
★
1.2k
I have a DNA-seq data and have to count incidences of specific 10bp occurrence, But it is an invitro data and has no specific genome. I understand perhaps I can align two sequences, My question is since I dont have a genome how shoudl I handle these FAsTq files to take the fragments to aligningl blast.
Thanks. Any direction or pointer will be helpful
If you only need to count the occurrence and doesn't allow for mismatch e.t.c (require exact match), then you can simply count the occurrence of the sub-string using your favorite programming language
It may have one or two bases of variations and I may have to use some kind of variation threshold.
I guess you can try converting the FASTQ into FASTA and use them as reference for BLAST, then align the FASTQ to the reference with BLAST allowing for multiple mapping and some degree of mutation. However, I am not particularly sure about that