I have some k-mers. For each k-mer I have a range of positions in the genome. I have to visualize it to analyze which range is more messy, how these range are scattered etc.
So, I have to plot k-mer vs range of position.
My Thinking:
I would convert k-mers to corresponding integer number consuming 2bits for every nt.
Now, I would have data like the following format(csv):
corresponding integer of a k-mer, starting of the range, ending of the range
What I have tried:
I have tried to plot them using python. But as the range of the positions, the mapped integer all are large numbers, it could not afford to plot even a single point.
Data Range:
The value of k = 15. So, it takes 30 bits to map in binary.
Range of positions are of the order of 10^6.
I have 392938 data in my file.
Could you please suggest me any tool or code to visualize or to plot this?
Note that:
More specifically, I want to see which minimizer covers which range. It is possible that a minimizer is covering a small portion. It is also possible that a minimizer is covering a large range.
Splitting is a good idea, but in this case, I have to plot so many figure.