Given Sequences, How To Compute A Low Complexity Score For Each Sequence?
3
0
Entering edit mode
11.1 years ago
lwc628 ▴ 230

Is there a way or a program/module that I can use to compute the complexity score for the given sequences? I want to rank sequences by their complexity.

sequence blast perl • 4.6k views
ADD COMMENT
0
Entering edit mode

Compressibility is related to the complexity of a sequence. Getting a high compression ratio (smaller file) would mean your sequence is not very complex. You can try using some standard compression algorithms and check the file size.

ADD REPLY
2
Entering edit mode
11.1 years ago
SES 8.6k

The software preseq was designed for just this purpose. From their website,

The preseq package is aimed at predicting and estimating the complexity of a genomic sequencing library, equivalent to predicting and estimating the number of redundant reads from a given sequencing depth and how many will be expected from additional sequencing using an initial sequencing experiment. The estimates can then be used to examine the utility of further sequencing, optimize the sequencing depth, or to screen multiple libraries to avoid low complexity samples.

The publication, Predicting the molecular complexity of sequencing libraries, is in Nature Methods.

ADD COMMENT
1
Entering edit mode
11.1 years ago
Kenosis ★ 1.3k

Perhaps this Perl resource will be helpful to you: Algorithms to compute DNA complexity.

ADD COMMENT
1
Entering edit mode
11.1 years ago
Biojl ★ 1.7k

There is a programme called SEG that basically replaces low complexity regions in protein sequences with X characters. You can then count them and divide by your length. It has several parameters you can adjust.

ADD COMMENT

Login before adding your answer.

Traffic: 1936 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6