Hi All,
Is there anyway I can split large FASTQ files into small FASTQ files with defined number of reads under windows environment, I know there are multiple option for unix but did not find anything for windows ?
best Deep
Hi All,
Is there anyway I can split large FASTQ files into small FASTQ files with defined number of reads under windows environment, I know there are multiple option for unix but did not find anything for windows ?
best Deep
or use some freeware like gsplit. Be remember to give the lines in mutiples of 4 [4 lines per read].
install http://www.cygwin.org , and use split
If you're familiar with R, you can use the ShortRead library to break the file up into smaller files. It's only a few lines of code. The example below takes a fastq file, breaks it up into sets of 1 million reads, writing the results to incrementally named smaller files:
library(ShortRead)
# set the file (.gz files also work)
yourFile <- "foo.fastq"
fileBaseName <- sub(".fastq$","",yourFile)
# iterate over fastq file
f <- FastqStreamer(yourFile, 1000000)
file_index <- 0
while (length(fq <- yield(f))) {
newName <- paste(fileBaseName, "_", file_index,".fastq", sep="")
writeFastq(fq,file=newName)
file_index <- file_index + 1
}
close(f)
You should be able to do this with Powershell, if you don't want to install Cygwin. Use a read count and a modulus operation on four lines, as Sukhdeep suggests.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
See also duplicate thread on SEQanswers: http://seqanswers.com/forums/showthread.php?t=28989