Question

Trying to select a certain range of sequences in Geneious

0

Entering edit mode

23 months ago

Enrico • 0

Hello everyone,

I'm currently working on Geneious v9.1. with NGS data. Due to each sample having a large amount of sequences working on them is both very time consuming and difficult to do for my PC. I'm trying so to select a limited amount of sequences starting the first one (e.g. if there are 100 million sequences I want to work only on the first 5 million one). From the Geneious manual I've read about normalization, but the result is different from what I'm trying to achieve. I've also thought about doing a "De Novo Assemble" and checking "Use X % of data", but I don't know if that's the most efficient way to do so.

Thanks in advance to everyone that'll help.

Geneious • 874 views

ADD COMMENT • link updated 23 months ago by GenoMax 147k • written 23 months ago by Enrico • 0

0

Entering edit mode

Your best bet is to contact Geneious support for this since it is commercial software and not many here may have access.

ADD REPLY • link 23 months ago by GenoMax 147k

0

Entering edit mode

I've already asked, I've been told about normalization only. So i suppose this kind of operation could / should be.done using another program. I'll try searching, but in the meantime if someone knows how to do this on Fastq files I'll be very thankfull

ADD REPLY • link 23 months ago by Enrico • 0

0

Entering edit mode

Subsampling can be done using reformat.sh from BBMap suite (command line java). To get 5 mil reads do the following:

reformat.sh -Xmx4g in=fastq.gz out=sampled.fastq.gz samplereadstarget=5000000

samplereadstarget=0     (srt) Exact number of OUTPUT reads (or pairs) desired.

Other program options:

seqtk sample and seqkit sample.

ADD REPLY • link 23 months ago by GenoMax 147k