Cut The Reads Of A Paired End Fastq File
4
3
Entering edit mode
13.0 years ago
Empyrean ▴ 170

Hi, I have the following 150bp reads. I would like to cut the bases which are more than 100bp. Also, I would like to cut from beginning of the read. Please let me know any script or program which can do this.

@HWI-DO4456:7:000000000-Z07CL:1:1:15052:1479 1:N:0:GGCTAC
TCCTCAGATTTTTTAGAAAGAGGAGTCTGCTTATAAGATAATGGCATCATTTTGATAGAATCTCCTCGCATTGTTGTAAAACTAATAACAAAGAAGGTTGGTTTTTGTGGTTTTGGTCTCCCGGCCTGAATCCAAGCTTGATGAATACGAA
+
@CCFFFFFHHHHHJJIJJJJJJIJJJJJJJJJIJJJIJJJIJJJJJJJJJJJJIJJJJJJIJGJJJJJJFHHFFFFFEEEEDEDDDDDDDCDDCBDD>@CB?BDDDDDDDCBDDDBBCDDDDDDDBDDDDDDDDDCBCDCBCDDDDEDDDD
@HWI-D04456:7:000000000-Z07CL:1:1:17590:1511 1:N:0:GGCTAC
TTAATTATACTTGTTGGTTTTGGTGGCGGATTAACATGGGGAGCAGTCGCTCTTCGTTGGGGTAAATAAGGACTGAGAGAAAAAAAGGAGTGTATTTTGTGAAGGTAGGGGCACAGTACCGTTGAAGCGTCTAATGAACGTGGAGGGATGG
+
illumina paired • 13k views
ADD COMMENT
3
Entering edit mode

See Rule 4 in the document linked from the very top of the page. Or put the terms 'trim fastq' into any search engine. Answer is on the first page.

ADD REPLY
0
Entering edit mode

out of curiosity, why would you want to cut from the beginning of the reads? Aren't they supposed to be of high quality??

ADD REPLY
0
Entering edit mode

Several library construction methods involve the addition of linker/adapter sequences that are inside the sequencing adapters (e.g. RNA-seq libraries made using a Nugen Ovation cDNA synthesis kit). These adapter bases will be at the beginning of the read and without being trimmed may result in an alignable read (at least using common next-gen aligners that is)...

ADD REPLY
0
Entering edit mode
ADD REPLY
5
Entering edit mode
13.0 years ago

There are many tools that will help you trim reads in fastq format. FASTX seems to work nicely. For example, if you want to trim the first 5 bases and use the next 100 bases you could do something like:

fastx_trimmer -f 6 -l 105 -z -i OriginalReads.fastq -o TrimmedReads.fastq

FASTA/Q Trimmer

    $ fastx_trimmer -h
    usage: fastx_trimmer [-h] [-f N] [-l N] [-z] [-v] [-i INFILE] [-o OUTFILE]

    version 0.0.6
       [-h]         = This helpful help screen.
       [-f N]       = First base to keep. Default is 1 (=first base).
       [-l N]       = Last base to keep. Default is entire read.
       [-z]         = Compress output with GZIP.
       [-i INFILE]  = FASTA/Q input file. default is STDIN.
       [-o OUTFILE] = FASTA/Q output file. default is STDOUT.
ADD COMMENT
2
Entering edit mode
13.0 years ago
lexnederbragt ★ 1.3k

With the fastq format, it is even possbile to use the unix cut command, but only if you want to keep the first X bases, and X is at least the length of the header:

cut -c 1-100 in.fq >out.fq

Might be faster than the other methods suggested, but I haven't tried...

ADD COMMENT
0
Entering edit mode

cool idea. maybe one can make it work with no restrictions.

ADD REPLY
1
Entering edit mode
13.0 years ago

With Biopieces you can do:

read_fastq -i in.fq | extract_seq -b 10 -e 100 | write_fastq -o out.fq -x

Trimming sequence is covered here.

ADD COMMENT
0
Entering edit mode
13.0 years ago
ALchEmiXt ★ 1.9k

Visit usegalaxy.org and all tools are there for you to use in one comprehensive framework (either on the public instance or after some configuration locally as well). It also allows to trim sequences based on quality values and such...

ADD COMMENT

Login before adding your answer.

Traffic: 1703 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6