Hello everyone,
I'm currently working on Geneious v9.1. with NGS data. Due to each sample having a large amount of sequences working on them is both very time consuming and difficult to do for my PC. I'm trying so to select a limited amount of sequences starting the first one (e.g. if there are 100 million sequences I want to work only on the first 5 million one). From the Geneious manual I've read about normalization, but the result is different from what I'm trying to achieve. I've also thought about doing a "De Novo Assemble" and checking "Use X % of data", but I don't know if that's the most efficient way to do so.
Thanks in advance to everyone that'll help.
Your best bet is to contact Geneious support for this since it is commercial software and not many here may have access.
I've already asked, I've been told about normalization only. So i suppose this kind of operation could / should be.done using another program. I'll try searching, but in the meantime if someone knows how to do this on Fastq files I'll be very thankfull
Subsampling can be done using
reformat.sh
from BBMap suite (command line java). To get 5 mil reads do the following:Other program options:
seqtk sample
andseqkit sample
.