Question

Trimming of adapters and indexes

0

Entering edit mode

3.3 years ago

Vasiliy Krestov ▴ 30

I investigate a protein which binds small DNA (<30 nt) and have a library of these small DNA. I know that adapters and indexes are from this site (5' adapter has T instead of U). [To reach the page I want to show click on the second "view online help bottom" and after that click "TruSeq kits" and after that click "TruSeq small RNA". You should see the page like this] link

At the start I had this level of adapter's contamination start And the quality of bases was ok

To remove adapters a ran this command

java -jar trimmomatic-0.39.jar SE inp_file out_file ILLUMINACLIP:adapters.fa:4:30:7 MINLEN:14 MAXLEN:21

I have question about which adapter's sequences should I use. On this site they provide a special sequence for trimming. But I tried to use 3'- and 5'-adapters sequences together and it had some effect on the output (e.g. I got less overrepresented sequences) but adapter content was null even with only one trimming sequence that was suggested.

I used MAXLEN:20 because of this first second third It seems to me that information is contained only in the first 20 bases (it is consistent with the length of nucleic acids which my protein binds). Is it ok to just clip these bad bases?

Additionally, I know that reads have 'TrueSeq index 1'. I tried to use the sequence for index from this site but it didn't have a big effect. Does index just get removed naturally during the process of adapter trimming or do I have to do something with this?

And also after all of this I still have a small number of overrepresented sequences over

Thanks in advance and best regards!

NGS small-DNA trimming trimmomatic sequencing • 1.8k views

ADD COMMENT • link updated 3.3 years ago by GenoMax 147k • written 3.3 years ago by Vasiliy Krestov ▴ 30

0

Entering edit mode

Your first link is not working right. Was that a link out to some website or were you trying to show a screenshot? Can you fix that?

I don't know about trimmomatic but you could use bbduk.sh (in trim mode or filter mode to separate reads that contain expected adapter). Something like

bbduk.sh -Xmx4g in=input.fq.gz out=clean.fq.gz literal=adapter_sequence1,adapter_sequence2 .. k=8 ktrim=r

ADD REPLY • link 3.3 years ago by GenoMax 147k

0

Entering edit mode

Yeah, It was a kind of site, I added a description. Thanks for your suggestion! But my question isn't about programs...

I want to get to know if it is enough to use for trimming the sequence the company suggests for trimming or should I use both adapter's sequences (the sequence they suggest for the trimming is equal to 3'-adapter). And my second question is about indexes, should I make something special to remove them? Provide some sequence to the program maybe?

ADD REPLY • link 3.3 years ago by Vasiliy Krestov ▴ 30

score 2 · Answer 1 · 2021-08-09

With smallRNA data it is prudent to follow the recommendations for data handling that are specific for the kit that was used. Since some smallRNA kits attach a special adapter to 3'-end of smallRNA, looking for presence of that adapter (to confirm that the molecule is valid smallRNA) and then trimming that is adequate. Note: You can almost see that adapter appearing on your FastQC plots (those artifacts that you see on tile plot) after the 21-22 bp (size of small RNA) where the reads start going into the adapters.

In Illumina sequencing index reads are never part of actual sequence (R1/R2) and you don't need to do anything special for them. They are automatically transferred to fastq headers of data files during demultiplexing and are only used for sample identification.