Entering edit mode
5.6 years ago
ezraamustafa3
•
0
How can I extract the sequences identifier only from a fastq file without the sequences or the quality scores using linux?
How can I extract the sequences identifier only from a fastq file without the sequences or the quality scores using linux?
Another option to print only read names is to print every 4th line, starting from the first line:
zcat file.fastq.gz | awk 'NR%4==1'
Use zcat myfile.fastq.gz | head
to see the first 10 lines. You should be able to see a couple of read names. The first few parts of it should be instrument ID and run ID, and if the fastqs are straight from the instrument, those should be constant in every read. Something like zgrep M012933 myfile.fastq.gz
should get you all the read names.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
It worked with me, really thanks!