Correct number of SNPs for gene length
0
0
Entering edit mode
3.8 years ago
vujex ▴ 10

I am trying to visualise the number of SNPs found in each of 40 bacterial genes. Some genes have less SNPS than others, but the genes differ in gene length (ranging from 150-2000bp). I am wondering if there is a way to correct for gene length (put everything on the same scale).

I have normalised the gene lengths already but I don't know what to do next... I know how to do calculations in R/Python, but simply can't figure out what to do...

Thanks in advance.

SNP • 891 views
ADD COMMENT
0
Entering edit mode

This seems like an XY problem (Google that term). What are you really trying to do - why do you need n(SNPs) scaled by gene length?

ADD REPLY
0
Entering edit mode

The longer the gene, the higher the chance it will get a SNP? If gene A of 150bp has 2 SNPS and gene B of 1500bp has 20 SNPS then on a visual representation it will look like gene B is more prone to acquire SNPs. Yes it has more SNPs but it is also 10x longer. That is what I am trying to correct for. Sorry if this is complete non sense....

ADD REPLY
2
Entering edit mode

That is not necessarily true - there could be a correlation, but it's definitely not a cause. Repeat regions would have a lot more regions variants, and structurally conserved regions/domains would have fewer variants owing to negative selection. In any case, you state that you've normalized gene length already, so what is left to do? Count the number of SNVs in the gene, multiple it by 1000 and divide that by the gene length to get SNV frequency in the gene per kb.

EDIT: Corrected a mistake

ADD REPLY
0
Entering edit mode

Thanks, this helped. Looking at the plots I made for both, on the one with SNV frequency/kb all of the genes are somewhat flat compared to the original counts where there were a few genes that stood out as having more SNPs. What you say about repeats makes sense too, will look onto that.

ADD REPLY

Login before adding your answer.

Traffic: 1743 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6