I have a txt file each line of which is a number corresponding to a specific read in a fastq file. I would like to make a subsetted fastq file from my larger fastq file with just the reads corresponding to the numbers in the txt file. Is there a simple way to do this? Thank you!
some thing like this? @ genomax @ johnsonn573. Example is with .fasta file. Same code works for fastq file and user needs to replace input fasta with input fastq. For fastq code would be: parallel seqkit range -r {}:{} test.fq :::: test.txt
Doing this by just (record?) number is going to be tricky. If you have read headers it would be much simpler to use filterbyname.sh from BBMap suite.
filterbyname.sh in=<file> in2=<file2> out=<outfile> out2=<outfile2> names=<string,string,string> include=<t/f>
names= A list of strings or files. The files can have one name per line, or
be a standard read file(fasta, fastq, or sam).
Run filterbyname.sh without any options to see in-line help.
That's fine. I tried gzipping the fastq file, and running as input.fq.gz, but the script still wouldn't run. So there must be another problem. I will post when I have a functioning script.
not clear: what is that number ? the line number starting from 0 ? from 1 ? the fastq record in the file ? starting from 0 ? from 1 ?
Please post few records from input files: fastq and text. In the absense of them, i would suggest to use seqkit grep/range function @ johnsonn573
Problem is OP here only has record numbers (odd) and not fastq headers, as far as I see.
some thing like this? @ genomax @ johnsonn573. Example is with .fasta file. Same code works for fastq file and user needs to replace input fasta with input fastq. For fastq code would be:
parallel seqkit range -r {}:{} test.fq :::: test.txt
input:
output: