How To Count K-Mers On Solid Data?
1
0
Entering edit mode
11.2 years ago
Alice ▴ 320

Hello, biostars!

I downloaded some public SOLiD data files in sra and fastq format. All files, i guess, are from cfasta+qual merged together. I want to count k-mers with jellyfish software and don't know if i need to somehow convert solid sra\fastq in fastq\fasta with nucleotides. For mapping there are suggestions not to convert because of potential numerous mistakes in reads. I thought, there is one simple way: convert data in cfasta+qual and than convert cfasta in ordinary fasta. Am I right? Fortunately, there are many scripts for conversion.

solid fastq • 2.5k views
ADD COMMENT
0
Entering edit mode
11.2 years ago

The problem with converting color-space to base-space is that if just one color-space is incorrect, the rest of the base-space will be wrong. Keeping the sequences in color-space actually retains more information. In terms of k-mer count, you might see a lot more unique k-mers than expected due to this problem. Or it might not matter at all depending on the error rate and position bias.

Do you know if the dataset has been error corrected with SAET? I would recommend just converting the reads to pseudo-base-space and running a k-mer count on that. So replace 0,1,2,3 with A,T,G,C.

ADD COMMENT

Login before adding your answer.

Traffic: 2065 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6