Entering edit mode
7.0 years ago
faraz.k89
•
0
Hi everyone, I am trying to build De Bruijn graph from short reads. I have some reads that has length < 10 (0.01 % only). I am just worried if those reads (however very small % of them) will create problem for graph building?
The stats i am getting for graph building is :
bank
bank_uri : SRR2847385_interleaved.fasta,SRR2847386_interleaved.fasta
bank_size : 117637467610
bank_total_nt : 89977612926
sequences
seq_number : 455322542
seq_size_min : 1
seq_size_max : 250
seq_size_mean : 197.6
seq_size_deviation : 55.4
kmers
kmers_nb_valid : 80415655424
kmers_nb_invalid : 3772037
stats
histogram
cutoff : 23
nb_ge_cutoff : 332423627
first_peak : 91
kmers
solidity_kind : sum
thresholds : 3 3
kmers_nb_distinct : 931511370
kmers_nb_solid : 452752530
kmers_nb_weak : 478758840
kmers_percent_weak : 51.4
As you can see large number of them are valid k-mers. Do you think the graph just ignore reads below a certain length?
Thanks in advance. Faraz.
The number of valid reads would depend on the k-mer size specified for the graph.