Question

Determine adapter sequence in RNA-seq samples

1

Entering edit mode

5.8 years ago

sabaghianamir70 ▴ 70

Hello

I was searching about, How chose the correct Adapter triming in this pdf https://support.illumina.com/content/dam/illumina-support/documents/documentation/chemistry_documentation/experiment-design/illumina-adapter-sequences-1000000002694-11.pdf . but i dont know which one, i just know the data,s are make with Next seq 500 illumina. i know im missing some important point, can you guys help me , thanks

RNA-Seq • 5.6k views

ADD COMMENT • link 5.8 years ago by sabaghianamir70 ▴ 70

0

Entering edit mode

So if the athor tells me they are cut all the adaptors, i dont need for trimming anything related to adaptors ? even the first 12 nocleotide in this picture ? enter image description here

ADD REPLY • link 5.8 years ago by sabaghianamir70 ▴ 70

2

Entering edit mode

That pattern, in the beginning, is caused by the library construction method which uses enzymatic fragmentation.

ADD REPLY • link 5.8 years ago by JC 13k

0

Entering edit mode

So should i cut it out or leave it be ?

ADD REPLY • link 5.8 years ago by sabaghianamir70 ▴ 70

0

Entering edit mode

The article genomax linked explains the issue, and suggests what should be done:

Mitigation

People often suggest fixing this issue by 5′ trimming of the reads to remove the biased portion – this however is not a fix. Since the biased composition is created by the selection of sequencing fragments and not by base call errors the only effect of trimming would be to change from having a library which starts over biased positions, to having a library which starts slightly downstream of biased positions.

Prevention

Ultimately this only fix for this issue will be in the introduction of new library preparation kits with a less bias prone priming step.

ADD REPLY • link 5.8 years ago by h.mon 35k

1

Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question.

As for the pattern you see in the plot above it is normal for RNAseq data. You can read more about this observation in a blog post from FastQC authors here. You do not need to do anything to that part of the read. It should align without any issues.

As for the adapters, as long as you are just aligning the data, modern aligners should be able to take care of any residual adapter sequences by soft-clipping them. If you are going to do any de novo assembly work then you should use one of the methods detailed below to ensure that all extraneous sequence gets removed before assembly.

ADD REPLY • link 5.8 years ago by GenoMax 153k

score 3 · Answer 1 · 2019-10-10

Yet another solution is AdapterRemoval:

AdapterRemoval --file1 reads_1.fastq --file2 reads_2.fastq --threads 8 --basename trimmed

It will print out lots of useful info:

Processed a total of 317,643,940 reads in 12:43.1s; 416,000 reads per second on average ...
   Found 103092850 overlapping pairs ...
   Of which 527537 contained adapter sequence(s) ...

Printing adapter sequences, including poly-A tails:
  --adapter1:  AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG
               ||||||||||||||||||||||||||||||||| ****** | |  | |       |
   Consensus:  AGATCGGAAGAGCACACGTCTGAACTCCAGTCAGCAGTTTTTTTTCTTTAAAAAAATAAAAAAAAAAAAAAAAAAAAAAAAAATAAATANTAAAAAATTTTTTTTTTTTTTTTTTTTTTTTTTTTTATTTTTTTTTTTA
     Quality:  ***)))(((''&&&%%%$$$$###"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

    Top 5 most common 9-bp 5'-kmers:
            1: AGATCGGAA = 71.70% (137696)
            2: AGATCGGCA =  0.27% (516)
            3: AGATAGGAA =  0.19% (365)
            4: CGATCGGAA =  0.18% (337)
            5: AGCTCGGAA =  0.16% (299)

  --adapter2:  AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
               |||||||||||||||||||||||||||||||||| || |   |  |     | | |||
   Consensus:  AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTATATTTTTTTTTTTTTTTTTTTATTAAAAAAAAAAAAAAAAAAAAAAAAAAATAAAAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTTTATTATTATTTTTATTT
     Quality:  ,,,+++**))((('&&&%%%$$$##""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""


    Top 5 most common 9-bp 5'-kmers:
            1: AGATCGGAA = 80.81% (176579)
            2: AGCTCGGAA =  0.21% (459)
            3: CGATCGGAA =  0.21% (448)
            4: AGATCGGCA =  0.20% (445)
            5: AGATAGGAA =  0.16% (360)

    --adapter1 SEQUENCE
        Adapter sequence expected to be found in mate 1 reads [default:
        AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG].

    --adapter2 SEQUENCE
        Adapter sequence expected to be found in mate 2 reads [default:
        AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT].

score 2 · Answer 2 · 2019-10-10

2

Entering edit mode

5.8 years ago

h.mon 35k

If your data is paired-end, several programs (such as fastp, peat or bbduk) can trim by overlapping forward and reverse reads and, strictly speaking, they don't need to know the adapters. fastp can auto-detect adapters also for single endequencing, and it will output adapter statistics, including adapter inferred / detected sequences.

ADD COMMENT • link 5.8 years ago by h.mon 35k

0

Entering edit mode

Another vote for fastp. I've switched to using to lately and really love it.

ADD REPLY • link 5.8 years ago by Dave Carlson ★ 2.1k

score 2 · Answer 3 · 2019-10-10

2

Entering edit mode

5.8 years ago

Makplus T ▴ 100

The best solution is to ask your sequencing data provider.
Typically, QC software (such as fastQC) can report some regular adapter types, while trim-galore can automatically detect and cut these adapters.

ADD COMMENT • link 5.8 years ago by Makplus T ▴ 100

score 1 · Answer 4 · 2019-10-10

1

Entering edit mode

5.8 years ago

JC 13k

a) ask the provider or who did the sequencing

b) use FastQC to check which adapter was used

ADD COMMENT • link 5.8 years ago by JC 13k