How To Count K-Mer For Color-Space Reads?
2
0
Entering edit mode
12.6 years ago
GAO Yang ▴ 250

Hi,guys my former question was here http://www.biostars.org/post/show/45025/how-to-estimate-genome-size-using-k-mer-coverage/#45148 my problem is this jellyfish could only handle with fasta format files. But mine was color space reads (1234, two bases determined one color) generated by SOLiD platform. Does anybody know another tool for countering K-mer that may support color-space reads ? Or: I think although Color-space is differ from base, the K-mer multiplicity for same sequence will be the same.So if I directly do" tr/1234/acgt/ (double encode)", and use it as input , the jellyfish may produce same result as with the fasta format input? Am I right?

Thanks for your attention, any advice will be appreciated!

counts • 3.3k views
ADD COMMENT
0
Entering edit mode

Yeah, double encoding should work.

ADD REPLY
1
Entering edit mode
12.6 years ago

(corrected)

I think it all comes down to what the purpose of kmer counting is.

Color space has a four fold redundancy, for example 000 could be AAAA, TTTT, CCCC or GGGG. This will affect your kmer representation and therefore I believe one cannot easily extrapolate from color space kmers to the actual number of sequence kmers.

You could transform to actual sequences, but that will lose some information.

http://www.biostars.org/post/show/43855/transforming-and-manipulating-color-space-reads/

ADD COMMENT
0
Entering edit mode

Yeah, But here I just want to estimate the genome size via K-mer counting. Do you think it'll do ?

ADD REPLY
0
Entering edit mode

I don't think that anyone could just say yes to this. It all depends on the genome under study. Count kmers both ways, double encoded and after decoding to letter space, this may provide some further insights.

ADD REPLY
0
Entering edit mode

Convincing ! I'll try both ways. But I'm afraid directly translate to base may cause higher error rate, and then affect the K-mer counting

ADD REPLY
0
Entering edit mode
12.6 years ago
Lee Katz ★ 3.2k

You should probably work entirely in color space. Transform your reference genome into color space and then map against that.

ADD COMMENT

Login before adding your answer.

Traffic: 1667 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6