Genome sequenced [%] over time graph (Nanopore)
0
0
Entering edit mode
20 months ago
j.zeidler • 0

Hi there,

I have sequencing data from a Nanopore run which I used to create a de novo assembly using Flye. Now I want to plot the genome coverage (in %) over the run duration (hh:mm:ss). Can anyone suggest a tool for creating such a graph?

Thanks!

flye Nanopore graph genome assembly • 1.4k views
ADD COMMENT
0
Entering edit mode

I am not sure how you would do that unless you are simply going to plot the total amount of cumulative data produced after each hour and calculate fold coverage.

Take a look at PycoQC (LINK) which has a graph that shows amount of sequence produced over the length of the run.

ADD REPLY
0
Entering edit mode

the genome coverage (in %)

With this, you mean how many bases in your assembly are covered by at least 1 read? i.e. if the first 20kb read aligns with your 2Mb assembly you are at 1%? Do you care about read clipping?

This is a bit complicated. Lots easier would be to get a plot of gigabases sequenced vs time (cumulative yield plot).

ADD REPLY
0
Entering edit mode

That´s exactly what I´m thinking of. Since the clipped regions are excluded from the assembly I think they also should be excluded from the genome coverage graph.

I know plotting the cumulative yield should be much easier (btw: is there a tool that outputs this kind of data as a table?), but e.g. for data from a metagenomic sequencing run, it would be interesting to illustrate the coverage of each genome over time.

ADD REPLY
3
Entering edit mode

Can you write some python to get this done? I could be able to help, but don't have too much free time.

Basically, I would parse and load both your fastq and bam files in a pandas dataframe and join them on the read id. This would give you, for every read ID, its sequencing start_time and its alignment start and stop. I would initialize a list or numpy array with zeroes, for the entire length of your chromosome. After sorting your dataframe on start_time you can iterate over the reads, and for every read that is aligned you change the corresponding positions in your coverage array to ones. After every read you sum the array and divide that sum by your contig length to get the coverage at that timepoint.

Hope that helps.

ADD REPLY
0
Entering edit mode

The code in this repository might be a good starting point: https://github.com/wdecoster/nanoretrotect

Basically, you need to combine mapping information with the time stamps you have in either the fastq or summary file(s).

ADD REPLY
0
Entering edit mode

Thanks a lot for your help! I think this is a good starting point. I will try to implement this in Python and keep you posted.

ADD REPLY
0
Entering edit mode

I have next week off, so don't despair if you don't hear from me for a while :)

ADD REPLY

Login before adding your answer.

Traffic: 2658 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6