Hi folks,
I'm interested in using oxford nanopore's taiyaki tool in order to train a new basecaller for modified bases at a known position. In order to train a new model basecaller I need to modify the fastq (or sam and convert back) for each fast5 file in order to signify this modified base. However I have around 10k reads, combined with minion's inherent error rate it's not really something I can edit in a regex way as far as I know.
Does anyone know of a method or script that can use a sam file aligned to a consensus where I can modify the base at a specific location which would get around the previous issues?
that looks like just the thing, thanks!
please flag the question as answered if it fulfills your needs (green tick on the left)
I was wondering if I could get a bit more help... When I run the command
java -jar /bioinformatics_tools/jvarkit/dist/biostar404363.jar -o modified.bam -p basecalled.vcf original.bam
The output is only partially converting all of my T's to N's for the first 30 or so entries, and the remainder (~6k) are not changing, even with no AF ratio in the VCF (below) which I'd assumed would convert all T's to N's?
Using the samtools -tview command in the link only a small proportion are being converted to N's, and these are the reads at the end of the terminal output, all of those at the beginning are unchanged. Is there anything I can do to alter this?
Also I realise this may be a bit much to ask but would it be possible to allow for the use of non cannonical bases, say Y in this workflow as this would be a very useful tool in order to create a training set for nanopore basecalling for novel modifications.
hard to answer without seeing the BAM and the VCF. Please use https://github.com/lindenb/jvarkit/issues , narrow the bam around the position please.