i have 2 fastq file (R1 and R2). The problem is R1 has 5 (not per se) sequence and R2 has 6. i want only those reads and its pair from R2 file. so i used seqtk
seqtk sample -s100 test_R1.fastq 5 >seq1
seqtk sample -s100 test_R2.fastq 5 >seq2
but i am not getting exact pairend reads from both file. is there any ready tool which does that
It's somewhat difficult to tell exactly what your situation is. If "test_R1.fastq" is out of sync with "test_R2.fastq", then sync them before proceeding (BBTools has a convenient function for that as I recall).
You cannot sample two files and expect the reads to be paired if they have a different number of sequences. As Devon said, you need to pair the reads first. Here is a lightweight solution using Pairfq:
There is inline documentation for above command so ./pairfq_lite will show you the usage, and there is more information about that specific command on the wiki online. If you have a lot of sequences and little memory, I would recommend installing the program from the link above and using the indexing method (if you use this approach).
ADD COMMENT
• link
updated 5.2 years ago by
Ram
44k
•
written 9.8 years ago by
SES
8.6k
What do you mean by 'exact pairend reads'? Paired-end reads from Illumina should ideally not overlap at all.