Question

how to removed the first two nucleotides from a fastq file (single-end)

1

Entering edit mode

4.1 years ago

17318598206 ▴ 30

"CAGE (cap analysis of gene expression; Table S1) was as described (Yang et al., 2011) and sequenced using a HiSeq 2000 (100 nt reads). After removing adaptor sequences and checking read quality using Flexbar 2.2 with the parameters of “-at 3 -ao 10 --min-readlength 20 --max-uncalled 70 --phred-pre-trim 10”, we retained only reads beginning with NG or GG (the last two nucleotides on the 5′ adaptor). We then removed the first two nucleotides and mapped the sequences to the mouse genome using TopHat 2.0.4. " This is the way the literature works, how do I write code to remove the first two nucleotides

CGAE-seq fastq • 2.4k views

ADD COMMENT • link updated 4.1 years ago by cpad0112 21k • written 4.1 years ago by 17318598206 ▴ 30

2

Entering edit mode

multiple ways:

$ cutadapt -u 2 -o new.fastq input.fastq
$ seqkit subseq -r 3:-1 input.fastq -o new.fastq
$ sed -r '0~2 s/^.{2}//' input.fastq > new.fastq
$ awk '{print (NR%2 == 0 ? substr($0,3): $0)}' input.fastq > new.fastq

ADD REPLY • link 4.1 years ago by cpad0112 21k

0

Entering edit mode

You can use "HEADCROP" option in "Trimmomatic"

ADD REPLY • link 4.1 years ago by reza ▴ 300

score 2 · Answer 1 · 2021-06-03

2

Entering edit mode

4.1 years ago

5heikki 11k

awk '{if(NR%2){print $0}else{print substr($0,3)}}' in.fq > out.fq

ADD COMMENT • link 4.1 years ago by 5heikki 11k

score 1 · Answer 2 · 2021-06-02

1

Entering edit mode

4.1 years ago

GenoMax 152k

code to remove the first two nucleotides

You can use bbduk.sh from BBMap suite like this:

bbduk.sh -Xmx2g in=your.fq out=trimmed.fq forcetrimleft=2

A guide to use BBDuk is available.

ADD COMMENT • link 4.1 years ago by GenoMax 152k