Calculate tetranucleotide frequency deviation on python
1
3
Entering edit mode
3.8 years ago
Chvatil ▴ 130

Hello everyone I'm looking for a function in python or Biopython that can calculate the tetranucleotide frequency of a given regions of scaffold.

The idea is that I have several regions and I want to identify possible changes in nucleotide composition that correspond to the an endogenization regions within my genome, for that I need to calculate theTNFs across regions for these contigs. I then need to calculate the Pearson correlation of these frequencies compared to the TNF of a set of the largest contigs in these genome assemblies (these contigs being probably really from the genome and were not endogenized).

Does someone know a such package in python?

Thanks you

biopython python TNFs • 1.4k views
ADD COMMENT
4
Entering edit mode
3.8 years ago
Mensur Dlakic ★ 28k

CheckM can do what you need - see here:

checkm tetra seqs.fna tetra.tsv

You can use the frequencies to separate the sequences in a 2D plot using various dimensionality reductions methods such as PCA, tSNE (shown below) or UMAP.

enter image description here

ADD COMMENT

Login before adding your answer.

Traffic: 1979 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6