Question

How can I remove adapter sequence from illumina 2000 paired end data?

0

Entering edit mode

8.2 years ago

tcf.hcdg ▴ 70

I have Illumina 2000 paired-end sequencing data. I did quality trimming with fast QC and then remove the adapter sequences (Illumina paired-end adapters) with cutadapt. From the results, I found that only a few reads have adapters. I then check it with trim galore which shows only 0.1% of the reads containing adapter sequences.

I am wondering why only 0.1 % of the sequences containing the adapter sequences.

cutadapt
  === Summary ===

  Total read pairs processed: 30,981,418
  Read 1 with adapter: 3,821 (0.0%)
  Read 2 with adapter: 2,104 (0.0%)
  Pairs that were too short: 434,082 (1.4%)
  Pairs written (passing filters): 30,547,336 (98.6%)

  Total basepairs processed: 15,490,709,000 bp
  Read 1: 7,745,354,500 bp
  Read 2: 7,745,354,500 bp
  Quality-trimmed: 466,256,549 bp (3.0%)
  Read 1: 85,966,692 bp
  Read 2: 380,289,857 bp
  Total written (filtered): 14,923,182,261 bp (96.3%)
  Read 1: 7,561,684,003 bp
  Read 2: 7,361,498,258 bp

the result summary of trim_galore

Trim galore
  === Summary ===

  Total reads processed: 30,981,418
  Reads with adapters: 38,498 (0.1%)
  Reads written (passing filters): 30,981,418 (100.0%)

  Total basepairs processed: 7,745,354,500 bp
  Quality-trimmed: 85,966,692 bp (1.1%)
  Total written (filtered): 7,659,104,092 bp (98.9%)

Did I use wrong adapter sequences or the adapters have already been removed after the sequencing?

adapters cutadapt trimgalore fastqc illumina PE • 3.8k views

ADD COMMENT • link updated 8.2 years ago by Brian Bushnell 20k • written 8.2 years ago by tcf.hcdg ▴ 70

3

Entering edit mode

Every sequence does not need to have an adapter. In fact you only see adapters in reads that have inserts that are smaller than the number of cycles of sequencing carried out.

Your data may be fine as is.

ADD REPLY • link 8.2 years ago by GenoMax 147k

0

Entering edit mode

Do you have adapters in the overrepresented sequences of the FASTQC report ?

ADD REPLY • link 8.2 years ago by Carlo Yague 8.9k

0

Entering edit mode

I found some of the over-represented sequence but they do not have the paired-end adapter sequence

ADD REPLY • link 8.2 years ago by tcf.hcdg ▴ 70

0

Entering edit mode

file:///home/tajammul/PhD_data/Radula_moss/clipped/a2_plus_b2_ATTCCT_L001_R2_001.trimmed_fastqc.html#M9

file:///home/tajammul/PhD_data/Radula_moss/clipped/a2_plus_b2_ATTCCT_L001_R1_001.trimmed_fastqc.html#M9

ADD REPLY • link 8.2 years ago by tcf.hcdg ▴ 70

0

Entering edit mode

Those kind of links are not going to work since they point to some file on your local desktop.

Your best bet is to take a screenshot of what you want to show and then upload it to one of the free image hosting sites (you can find them once you press Ctrl+G in biostars message edit window.

ADD REPLY • link 8.2 years ago by GenoMax 147k

0

Entering edit mode

ADD REPLY • link 8.2 years ago by tcf.hcdg ▴ 70

0

Entering edit mode

So unless you used home-made adapters, your data should be clean. FASTQC automatically detect 'classic' adapters in the overrepresented sequences.

ADD REPLY • link 8.2 years ago by Carlo Yague 8.9k

0

Entering edit mode

http://tinypic.com/view.php?pic=2ngucrc&s=9#.V-KX3tHQPCI

ADD REPLY • link 8.2 years ago by tcf.hcdg ▴ 70

score 0 · Answer 1 · 2016-09-21

As Genomax said, only fragments with insert size shorter than read length contain adapter sequence. You can generate an insert size histogram with BBMerge (from the BBMap package) and also determine the actual adapter sequence like this:

bbmerge.sh in1=r1.fastq in2=r2.fastq outa=adapters.fa ihist=ihist.txt

If only 0.1% of the reads have an insert size shorter than read length, adapter-trimming probably went correctly.