Question

Cut The Reads Of A Paired End Fastq File

3

Entering edit mode

12.9 years ago

Empyrean ▴ 170

Hi, I have the following 150bp reads. I would like to cut the bases which are more than 100bp. Also, I would like to cut from beginning of the read. Please let me know any script or program which can do this.

@HWI-DO4456:7:000000000-Z07CL:1:1:15052:1479 1:N:0:GGCTAC
TCCTCAGATTTTTTAGAAAGAGGAGTCTGCTTATAAGATAATGGCATCATTTTGATAGAATCTCCTCGCATTGTTGTAAAACTAATAACAAAGAAGGTTGGTTTTTGTGGTTTTGGTCTCCCGGCCTGAATCCAAGCTTGATGAATACGAA
+
@CCFFFFFHHHHHJJIJJJJJJIJJJJJJJJJIJJJIJJJIJJJJJJJJJJJJIJJJJJJIJGJJJJJJFHHFFFFFEEEEDEDDDDDDDCDDCBDD>@CB?BDDDDDDDCBDDDBBCDDDDDDDBDDDDDDDDDCBCDCBCDDDDEDDDD
@HWI-D04456:7:000000000-Z07CL:1:1:17590:1511 1:N:0:GGCTAC
TTAATTATACTTGTTGGTTTTGGTGGCGGATTAACATGGGGAGCAGTCGCTCTTCGTTGGGGTAAATAAGGACTGAGAGAAAAAAAGGAGTGTATTTTGTGAAGGTAGGGGCACAGTACCGTTGAAGCGTCTAATGAACGTGGAGGGATGG
+

illumina paired • 13k views

ADD COMMENT • link updated 11.4 years ago by Eric Normandeau 11k • written 12.9 years ago by Empyrean ▴ 170

3

Entering edit mode

See Rule 4 in the document linked from the very top of the page. Or put the terms 'trim fastq' into any search engine. Answer is on the first page.

ADD REPLY • link 12.9 years ago by biobot 0.0.77.a.1099 6.2k

0

Entering edit mode

out of curiosity, why would you want to cut from the beginning of the reads? Aren't they supposed to be of high quality??

ADD REPLY • link 12.9 years ago by Arun 2.4k

0

Entering edit mode

Several library construction methods involve the addition of linker/adapter sequences that are inside the sequencing adapters (e.g. RNA-seq libraries made using a Nugen Ovation cDNA synthesis kit). These adapter bases will be at the beginning of the read and without being trimmed may result in an alignable read (at least using common next-gen aligners that is)...

ADD REPLY • link 12.9 years ago by Malachi Griffith 20k

0

Entering edit mode

Duplicate post: http://seqanswers.com/forums/showthread.php?t=16281

ADD REPLY • link 12.9 years ago by Peter 6.0k

score 5 · Answer 1 · 2011-12-19

There are many tools that will help you trim reads in fastq format. FASTX seems to work nicely. For example, if you want to trim the first 5 bases and use the next 100 bases you could do something like:

fastx_trimmer -f 6 -l 105 -z -i OriginalReads.fastq -o TrimmedReads.fastq

FASTA/Q Trimmer

    $ fastx_trimmer -h
    usage: fastx_trimmer [-h] [-f N] [-l N] [-z] [-v] [-i INFILE] [-o OUTFILE]

    version 0.0.6
       [-h]         = This helpful help screen.
       [-f N]       = First base to keep. Default is 1 (=first base).
       [-l N]       = Last base to keep. Default is entire read.
       [-z]         = Compress output with GZIP.
       [-i INFILE]  = FASTA/Q input file. default is STDIN.
       [-o OUTFILE] = FASTA/Q output file. default is STDOUT.

score 2 · Answer 2 · 2011-12-19

2

Entering edit mode

12.9 years ago

lexnederbragt ★ 1.3k

With the fastq format, it is even possbile to use the unix cut command, but only if you want to keep the first X bases, and X is at least the length of the header:

cut -c 1-100 in.fq >out.fq

Might be faster than the other methods suggested, but I haven't tried...

ADD COMMENT • link 12.9 years ago by lexnederbragt ★ 1.3k

0

Entering edit mode

cool idea. maybe one can make it work with no restrictions.

ADD REPLY • link 12.9 years ago by Fabian Bull ★ 1.3k

score 1 · Answer 3 · 2011-12-18

1

Entering edit mode

12.9 years ago

Martin A Hansen 3.0k

With Biopieces you can do:

read_fastq -i in.fq | extract_seq -b 10 -e 100 | write_fastq -o out.fq -x

Trimming sequence is covered here.

ADD COMMENT • link 12.9 years ago by Martin A Hansen 3.0k

score 0 · Answer 4 · 2011-12-19

0

Entering edit mode

12.9 years ago

ALchEmiXt ★ 1.9k

Visit usegalaxy.org and all tools are there for you to use in one comprehensive framework (either on the public instance or after some configuration locally as well). It also allows to trim sequences based on quality values and such...

ADD COMMENT • link 12.9 years ago by ALchEmiXt ★ 1.9k