fastQ files exploration
2
0
Entering edit mode
5.6 years ago

How can I extract the sequences identifier only from a fastq file without the sequences or the quality scores using linux?

fastq NGS • 1.5k views
ADD COMMENT
2
Entering edit mode
5.6 years ago
h.mon 35k

Another option to print only read names is to print every 4th line, starting from the first line:

zcat file.fastq.gz | awk 'NR%4==1'
ADD COMMENT
0
Entering edit mode
5.6 years ago

Use zcat myfile.fastq.gz | head to see the first 10 lines. You should be able to see a couple of read names. The first few parts of it should be instrument ID and run ID, and if the fastqs are straight from the instrument, those should be constant in every read. Something like zgrep M012933 myfile.fastq.gz should get you all the read names.

ADD COMMENT
0
Entering edit mode

It worked with me, really thanks!

ADD REPLY

Login before adding your answer.

Traffic: 985 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6