I want to plot the GC content along a genome contig. And since it's not possible to estimate percentage or fractions of a single position I need to use some sort of window along the contig to estimate. I found this page which uses bedtools makewindows
and bedtools nuc
to estimate the GC-content in 1000 bp, non-overlapping windows.
In order to get a gc-content number for every nucleotide on the contig I added the option -s 1
in bedtools makewindows
to shift the windows one nucleotide each time. And then I calculated the gc content of each window using bedtools nuc
. I was thinking that the gc content of the first window could represent the gc content of the first nucleotide, and so on. But this means that the nucleotide which is the first in each window gets the gc content of the entire window?
Any thoughts on this? Or suggestions on how to better visualize the gc content along a contig?
Thanks, Jon
I don't quite understand this. If you need GC content for every base why use a sliding window?
Is there an alternative to a sliding window? I need to use some kind of a collection of nucleotides to calculate frequencies? If you know of any better methods to calculate GC content for every base I would be very happy.
GC content would be an average across the window size you are choosing. I assume the
-s
option is step-size forbedtools makewindow
. If you were selecting a 100 bp window then you get the GC% across initial 100 bp window. You then slide the window over by 1 bp and get GC% for 2-101 bp and so on.You can use
cpgplot
from EMBOSS for this. Download EMBOSS for more flexibility.Thanks, I'll check it out.
Yes, this is how I also see it. But the calculated GC content for the first nucleotide on the contig would be the average across the first 100 nucleotides. But I think that actually nucleotide nr. 50 (the middle in the first window) should rather have the GC-content for the first window. So with this procedure, each nucleotide gets the GC-content of mostly the 99 succeeding nucleotides. And I felt that this was not accurate enough. But perhaps I am misunderstanding something.
Hi GenoMax, hope everything is ok with you bro. Do you know if I can use bedtools makewindow approach to multifasta fileas? I take a look at documentation but didn't find any info? Thanks