Split paired ended hWGS FASTq files to simulate difference sequence coverage
1
0
Entering edit mode
1 day ago
DJBill ▴ 20

I have human whole genome sequence data generated from paired ended 150 bp sequence reads. It represents about 30X coverage. So, each hWGS dataset set has lane1_read1.FASTq.gz, lane1_read2.FASTq.gz.

I want to split or fragment each FASTq paired file to simulate lower coverage like 7X, 15X, 20X etc so that I can determine if such lower sequence coverage can detect what I am looking for. I will use the simulated lower sequence coverage to generate BAM files and subsequent downstream applications.

Please advise what is the best approach and bioinformatics tools to do so.

Thank you for your assistance.

whole-genome-sequencing wgs FASTQ coverage • 136 views
ADD COMMENT
0
Entering edit mode
1 day ago
shelkmike ★ 1.5k

If I understand correctly, you want to do downsampling. You can do it, for example, with "seqkit sample" (https://bioinf.shenwei.me/seqkit/usage/) or "seqtk sample" (https://github.com/lh3/seqtk)

ADD COMMENT

Login before adding your answer.

Traffic: 1730 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6