Im doing different analysis using dotplots in a project.
1) First im comparing a sequence read (in fastQ format) to a reference genome, in order to identify some genomic regions where the read aligns, in order to identify some unique and repetitive regions.
2) Then im suppose to calculate the quality of the sequence reads used in step 1 (to remove the ones with low quality), by using a weight matrix. So im thinking if one uses a PSSM constructed using the base calling values from the fastQ format in line 4 for each specific read, then you can create a single PSSM pr. read.
But im not sure how I'm suppose to assess quality of the alignment, because i doesnt make sense to compare the the single read sequence to the constructed PSSM.
So im thinking, if you have: the first sequence (below) being the read and the second being the ref genome
T A G G T C A T T
T A G G T A C T G
then there would be mismatches on the 6th,7th and 9th position, then couldn't you compare the second sequence (ref seq) to the created PSSM, in order to get a score/probability of how well it matches?
Or do you guys have any other suggestions to asses the quality of the reads using a weight matrix and dotplots?
Many thanks for your help.