how can I extract the first and last N bases from a read in a fastq file?
I have used the following command to extract the last 1000 bases of a read from a fastq file but I'd also like to incorportate the first 1000 bases to the command as well:
$$ grep -A 4 "read_name_identifier" filename.fq | sed -n '2~4p' | grep -o '.{1000}$'
Also, how can I use the new command for the first and last N bases on a perl script as I have >450 reads in a fastq file?
Many thanks,
Any help will be appreciated.
If you want to use perl (or python), I would suggest parsing the file 'properly' with the Bio module. Extracting this information will be fairly trivial, and much more robust.