I am new to this and got undesirable results. Data come from nanopore-wgs-consortium and i am using remora to print out the mapping result from alignment. I used bonito with --reference option and the .mmi file is from minimap2 indexing GRCh38_full_analysis_set_plus_decoy_hla.fa. I tried both fast5 file from Native RNA type and Rel6 Data. However there is big difference between basecalls length vs. Reference mapping length and the graph is terrible, nothing like the sample in Plot Reference Alignment with Signal Overlay from remora, where basecalls length and reference mapping length are very close. Am i doing something wrong here or it is normal to have such results. Some of my results:
print(f"Basecalls length: {io_read.seq_len}")
print(f"Reference mapping length: {io_read.ref_seq_len}")
print(f"Reference location: {io_read.ref_reg}")
Basecalls length: 14727
Reference mapping length: 1044
Reference location: RefRegion(ctg='chr1_KI270712v1_random', strand='+', start=104801, end=105845)
Basecalls length: 3573
Reference mapping length: 386
Reference location: RefRegion(ctg='chr2', strand='+', start=32916232, end=32916618)
Basecalls length: 3130
Reference mapping length: 366
Reference location: RefRegion(ctg='chr2', strand='+', start=32916259, end=32916625)
Basecalls length: 286
Reference mapping length: 159
My graphs(it is already the better ones):
Sample graph:
If it is problem with the data, is there any recommended datasets, preferably already basecalled and aligned with their corresponding pod5 file provided? My end goal is to see if smoothing the signal in pod5 file can improve results in basecalling. So there is no restriction in the datasets. Thanks in advance.