I am developing a pipeline (with GATK v3.7) for data from whole-exome sequencing from Illumina NextSeq 500. I have FASTQ files, and I want to create a very small dummy dataset from this, so that I can easily test tweaks to my pipeline, without having to wait for hours every time I tweak something.
Can I simply use, say, the first 1000 reads from a file, and expect that to work as a test dataset?
you can do that or you can random sample your fastq.
For first 1000 sequences:
For random sampling 1000 reads from your fastq:
Download seqkit from here: http://bioinf.shenwei.me/seqkit/download/
small typo in the second command