Hello, what command could I use to extract every 3rd or every 4th pair of reads from two fastq files corresponding to pair-end reads (file read1 and file read2)? I want to use it for getting smaller raw data files, which would be representative of the larger files, since they would be reduced randomly.
Also, the same question regarding how to randomly reduce bam and bed files, such as accepted_hits.bam and junctions.bed generated by tophat?
Thank you,
Ephraim Trakhtenberg
The approach you mention is not technically random, but this task is a pretty common one. There has been plenty of good discussion of this site about sampling large files, for example, Selecting Random Pairs From Fastq? (and see the 'similar posts' section to the right of that page for more posts on the topic). Hope that helps.