Question

How to get adapter information from SRA dump FASTQ file??

0

Entering edit mode

9.2 years ago

SOHAIL ▴ 410

Hi Everyone,

I downloaded *.sra files of whole genome sequences from SRA database, I extracted FASTQ files and later run FASTQC for raw data QC check.

The "kmer content" section shows overrepresented sequences.

Sequence Count PValue Obs/Exp Max Max Obs/Exp Position

CGCCGTA 79245 0.0 15.361623 46-47

GTCGCCG 102530 0.0 12.1878605 44-45

TCGCCGT 105080 0.0 11.008696 46-47

GCCGTAT 115190 0.0 10.8824835 48-49 ....

I have no idea clearly about what adapter sequences were used, can anyone tell me from where i can find adapter information of SRA downloaded file?

or how can i trim the kmer sequences over representation? I am afriad the kmer length is short they can match with any where in the genome randomly..

Thanks,

sohail

next-gen • 5.5k views

ADD COMMENT • link updated 7.8 years ago by Brian Bushnell 20k • written 9.2 years ago by SOHAIL ▴ 410

score 1 · Answer 1 · 2016-06-09

You can find if the reads are contaminated with adapters in the over-representation section of the FASTQC output file. Most of the Illumina adapters are recognized and listed. If the downloaded SRA file is from Ion Torrent platform, there is less chance of adapter being there, as the torrent server trims the adapter by default.

To trim the short over represented sequences from either end of the reads, you can use cutadapt -u option. Hope this helps.

score 1 · Answer 2 · 2017-10-10

1

Entering edit mode

7.8 years ago

Karma ▴ 310

You can use autoadapt link. This tool automatically detects adaptors and removes it. The packages used by autoadapt are fastqc and cutadapt

usage

autoadapt.pl input.fastq output.fastq

ADD COMMENT • link 7.8 years ago by Karma ▴ 310

score 1 · Answer 3 · 2017-10-10

If you have paired reads, you can use BBMap to identify adapter sequences like this:

bbmerge.sh in1=r1.fq in2=r2.fq outa=adapters.fa strict

Alternately, the BBMap package is distributed with a file "adapters.fa" which contains most adapters commonly used in Illumina sequencing, and tends to work well for adapter-trimming.