Entering edit mode
8 months ago
Hello, I have been working on trying to extract metrics with remora on a very large nanopore dataset. I am running into issues with the speed at which I can extract the metrics. I am essentially attempting to extract metrics for 100bp windows I have created. Below is the code that I am using which is working however it is taking extremely long. Is there any way to more efficiently achieve what I want?
# Iterate through windows and process each read
for _, window_row in windows.head(1000).iterrows():
# Extract window information
chromosome = window_row['chr']
mapped_start = int(window_row['mapped_start'])
mapped_end = int(window_row['mapped_end'] + 1)
read_id = window_row['read_id']
strand = window_row['strand']
feature = window_row['feature']
# Format mapped_start and mapped_end with underscores
formatted_mapped_start = f"{mapped_start:_}"
formatted_mapped_end = f"{mapped_end:_}"
# Check if the Pod5 file associated with the read is loaded
if read_pod5_mapping.get(read_id) in pod5_readers:
# Retrieve the Pod5 reader associated with the BAM read
reader = pod5_readers[read_pod5_mapping[read_id]]
# Read the selected read from the Pod5 file
pod5_read = next(reader.reads(selection=[read_id]))
bam_read = bam_fh.get_first_alignment(read_id)
io_read = io.Read.from_pod5_and_alignment(pod5_read, bam_read)
sample_rate = pod5_read.run_info.sample_rate
# Define the reference region for the read
ref_reg = io.RefRegion(ctg=str(chromosome), strand=strand, start=int(formatted_mapped_start), end=int(formatted_mapped_end))
io_read.set_refine_signal_mapping(sig_map_refiner, ref_mapping=True) #perform signal mapping refinement on reference mapping
read_metrics = io_read.compute_per_base_metric("dwell", ref_anchored=True, region=ref_reg)
translocation_times = read_metrics["dwell"] / sample_rate