I have generated a consensus sequence using 'pileup' then 'pileup2fq' from samtools. Can anyone tell me exactly what determines whether the resulting sequence is in UPPER or lower case?
An example of the fastq is:
@header
GTTAAGATGAAACATTTACAGGATTTGATTGACGAACCTGATGAtttttcacaacccaat ccatCTtagactagaaaggtaTTTACGGTTGCTaaacattgcgttatgtttaaGACCTCA TGCCAATAGACTGTTTGAATTTTATGAactgtctcctttgggaaacttgttaagtcgtga aastnnnnnnnnnnnnnnncaagggtacttggtcatcagatctaccgcaaaagctCAAGG
+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~r oZMH!!!!!!!!!!!!!!!KZo~~Z~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Thanks.
In this case, I guess you can mark your own answer as the best :-)
I wish i hadn't asked it now....
so just to make sure, the lower case letters were either covered but filtered out due to the aforementioned reasons or they were just not covered in any of the reads?
From my experience, if they are not at all covered then you do not get any bases in the fastq file.
I run the command but the lower case letters are not filtered at all. Why?
Hi Ian. Just curious. After coverting to FASTQ format, how you end up in estimation of mutation rate (which software/tools do you used)?