I am deriving some sequencing consensus reads from fastq files, and I would like to keep track of some information during the derivation process, and I'm wondering if there is a good way to do this. To simplify the problem if I have a fastq file with the following read in it, I want to associate some information with each base prior to alignment and variant calling, in this instance just a single number.
@SEQ_ID
GATC
+
!''*(
@SEQ_ID
GATC
7452 <---- associated information
+
!''*(
Then I would like this information to be accessible after alignment and variant calling. If its possible, it would be convenient to have this populated into the resulting vcf file, but not necessary if there is a better way. Here I am showing my idea of how a G>A change at the second nucleotide within the fastq file would look.
#CHROM POS ID REF ALT MYINFO
2 4370 rs6057 G A 4
And can you explain what the goal is of this?
The goal is a bit complicated, but essentially I have barcoded fastq reads that I am binning together and using for consensus sequence derivation. I would like to retain parts of the information in the binned reads such as percent sequence agreement at each position. Then I would be using this information to inform some statistics/confidence calculations for each identified variant.