Slow Python Script Pulling Metrics from Remora Oxford Nanopore
0
0
Entering edit mode
8 months ago
turcoa1 • 0

Hello, I have been working on trying to extract metrics with remora on a very large nanopore dataset. I am running into issues with the speed at which I can extract the metrics. I am essentially attempting to extract metrics for 100bp windows I have created. Below is the code that I am using which is working however it is taking extremely long. Is there any way to more efficiently achieve what I want?

# Iterate through windows and process each read
    for _, window_row in windows.head(1000).iterrows():
        # Extract window information
        chromosome = window_row['chr']
        mapped_start = int(window_row['mapped_start'])
        mapped_end = int(window_row['mapped_end'] + 1)
        read_id = window_row['read_id']
        strand = window_row['strand']
        feature = window_row['feature']

        # Format mapped_start and mapped_end with underscores
        formatted_mapped_start = f"{mapped_start:_}"
        formatted_mapped_end = f"{mapped_end:_}"

        # Check if the Pod5 file associated with the read is loaded
        if read_pod5_mapping.get(read_id) in pod5_readers:
            # Retrieve the Pod5 reader associated with the BAM read
            reader = pod5_readers[read_pod5_mapping[read_id]]

            # Read the selected read from the Pod5 file
            pod5_read = next(reader.reads(selection=[read_id]))
            bam_read = bam_fh.get_first_alignment(read_id)
            io_read = io.Read.from_pod5_and_alignment(pod5_read, bam_read)
            sample_rate = pod5_read.run_info.sample_rate

            # Define the reference region for the read
            ref_reg = io.RefRegion(ctg=str(chromosome), strand=strand, start=int(formatted_mapped_start), end=int(formatted_mapped_end))

            io_read.set_refine_signal_mapping(sig_map_refiner, ref_mapping=True) #perform signal mapping refinement on reference mapping

            try:
                read_metrics = io_read.compute_per_base_metric("dwell", ref_anchored=True, region=ref_reg)
                translocation_times = read_metrics["dwell"] / sample_rate
Nanopore Remora Long-Read ONT • 260 views
ADD COMMENT

Login before adding your answer.

Traffic: 1357 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6