Hi there,
Can anyone suggest a tool or method to extract random 10GB reads with minimum read length of (1000bp) from a huge 100 Gb file.
I have 50 different fa.gz files with varying size (20 -100GB) and I like to subsample fasta with 10gb size each.
Thanks
Best
sam
sampling 100 Gb file takes long time and requires enough computational resources. You can use seqkit to random sample fasta files. use -m and -M options depending on your requirements. Use two pass and threads.