I would like to write a simple script or single line perl to convert the following two Solexa sequence reads (assume these reads in a text file "Solexa.export.txt") into fastq format. (Note: the quality scores are coded as ASCII offset of 64):
ILLUMINA-A93428 66 1 1 7848 1267 CGATGT 1 ATGTTGGCTGGCGGTGAAATAAATCTCAAACGTACCTGTTACAAAATTGTGTTAATCCTTTCAGATTCGCAG ggggggggggggggcfdffdgggfefffffcfcffcccacBeddddded_cffefgggggBBBBBBBBBBBB chrX.fa 122287709 R 40C21GAC7 269 Y
ILLUMINA-A93428 66 1 1 8782 1281 CGATGT 1 GACTCCGGGAAAGCAAGCTGAAGTGTTTTGTGGGACAGGCACATCATTTGGTCAACTCCTTTTCCCTGGTTG hhhhhhhhhhhhhhhhhhhhghfehhhhhhghfhf_ffffghhhhhhhhdaaghhghhhhgfeghfghhe`] chrX.fa 144709339 F 72 335 Y
Also need help identifying mismatched bases in "Solexa.export.txt"
I mean the chromosomal position of locations where the reference base does not match the alternative base. Also, thanks for your reply!
Not sure about the exact math, but you can parse the 14th field and use the start position of the read to get the exact position of the mismatch in the reference genome. Not sure which field represents position in your output since it doesn't look like it's in SAM format, but you could take the start site, add 41, and that should be the position of the C mismatch. Should be a little perl script.
Happy to help!