- How to extract the sequences of organelle genomes from whole-genome sequencing data?
- How to extract high-abundance microbial sequences from metagenomic sequencing data?
How to extract the sequences of highly expressed genes from transcriptomic sequencing data?
We developed a software called HFKReads, which enables rapid extraction of high-frequency k-mer reads from sequencing data. To evaluate the performance and efficiency of this software, we extracted reads of the organelle genome from whole-genome sequencing data of plants. Approximately 95-99% of the nuclear genome sequences were effectively removed, and 1 Gb of reads could be extracted in less than 1 minute using a single-threaded approach.
Using the extracted sequences, the assembly of organelle genomes showed a 10-20 times improvement in speed. Additionally, the quality of the assembly was significantly enhanced as most of the organelle genome sequences inserted into the nuclear genome were excluded. We hope that this software can provide valuable assistance to your research.
You can get the HFKReads code and manual on github here