Question

Extracting only the sequence from fastq files

0

Entering edit mode

8.6 years ago

padma18krishna • 0

How to extract only the nucleotide sequence from the entire fastq file containing 1000's of reads using Linux ?

Fastq Linux • 12k views

ADD COMMENT • link updated 8.6 years ago by Devon Ryan 104k • written 8.6 years ago by padma18krishna • 0

0

Entering edit mode

Adding to the below answer, this linux tutorial would help you for future tasks.

ADD REPLY • link 8.6 years ago by venu 7.1k

0

Entering edit mode

Since a couple of answers below have asked "why anyone would want to do this" I will offer one explanation. It may not be the one applicable in this case.
I remember doing this for someone who wanted to get counts and sequences of all unique combinations present in the dataset. Just counts.

ADD REPLY • link 8.6 years ago by GenoMax 148k

0

Entering edit mode

The thing is, it looks like an assignment offloading.

ADD REPLY • link 8.6 years ago by Ram 44k

score 1 · Answer 1 · 2016-05-26

1

Entering edit mode

8.6 years ago

5heikki 11k

For example:

paste - - - - <file.fq | cut -f2 > only_seq

Don't know why anyone would want to do that but it's what you asked..

ADD COMMENT • link 8.6 years ago by 5heikki 11k

0

Entering edit mode

Agreed, to tack onto this, think about the problem, remember "just because you can, doesn't mean you should..."

ADD REPLY • link 8.6 years ago by andrew.j.skelton73 6.6k

score 1 · Answer 2 · 2016-05-26

1

Entering edit mode

8.6 years ago

Devon Ryan 104k

awk '{if(NR%4==2) print $0}' file.fq > seq.txt

No clue why you'd want to do that, but there you go.

ADD COMMENT • link 8.6 years ago by Devon Ryan 104k

1

Entering edit mode

or just awk '(NR%4==2)' file.fq > seq.txt :-P

ADD REPLY • link 8.6 years ago by Pierre Lindenbaum 164k