Question

Split paired ended hWGS FASTq files to simulate difference sequence coverage

0

Entering edit mode

4 months ago

DJBill ▴ 20

I have human whole genome sequence data generated from paired ended 150 bp sequence reads. It represents about 30X coverage. So, each hWGS dataset set has lane1_read1.FASTq.gz, lane1_read2.FASTq.gz.

I want to split or fragment each FASTq paired file to simulate lower coverage like 7X, 15X, 20X etc so that I can determine if such lower sequence coverage can detect what I am looking for. I will use the simulated lower sequence coverage to generate BAM files and subsequent downstream applications.

Please advise what is the best approach and bioinformatics tools to do so.

Thank you for your assistance.

whole-genome-sequencing wgs FASTQ coverage • 557 views

ADD COMMENT • link updated 4 months ago by shelkmike ★ 1.7k • written 4 months ago by DJBill ▴ 20

score 0 · Answer 1 · 2025-03-19

0

Entering edit mode

4 months ago

shelkmike ★ 1.7k

If I understand correctly, you want to do downsampling. You can do it, for example, with "seqkit sample" (https://bioinf.shenwei.me/seqkit/usage/) or "seqtk sample" (https://github.com/lh3/seqtk)

ADD COMMENT • link 4 months ago by shelkmike ★ 1.7k