small wheat samples fastq or fatq.gz for testing pipeline
2
0
Entering edit mode
5.4 years ago

I am testing for a o pipeline of wheat fastq files to do variant calling on them I have a couple of raw and trimmed (cutadapt) files but they are too big for simple testing where cna I get shorter files or shorten the ones that I got in order to be able to test faster?

test pipeline wheat • 1.7k views
ADD COMMENT
2
Entering edit mode
5.4 years ago

Why can't you just take the first 4000 lines of your fastq?

ADD COMMENT
0
Entering edit mode

would that be ok shoudl I take the headers and everything how should I do this?

ADD REPLY
1
Entering edit mode

Fastq files have no headers. It is a plain text file. Please google how to get subsets of text files with Unix tools such as head. If the file is compressed, decompress first in a pipe such as zcat your.fastq.gz | head -n 400000 > subset.fq. This would get you the first 100.000 reads (factor 4 because one read consists of 4 lines, check fastq format specifications on why that is).

ADD REPLY
0
Entering edit mode
5.4 years ago
MatthewP ★ 1.4k

Hello, use seqtk. Command seqtk sample can sample specific number of reads randomly from fastq file. If your sequencing is paired-ends, remember to use same seed for both fastq file.

ADD COMMENT
0
Entering edit mode

You do not need random sampling as reads in fastq are already randomized due to undirected loading of DNA to the flow cell. head does just fine.

ADD REPLY

Login before adding your answer.

Traffic: 2164 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6