Hello,
I'm quite new to bioinformatics. I'm attempting to trim adapters from an Illumina HiSeq 2500 dataset, prepped with a TruSeq prep-kit... however I can't figure out what the adapters are. First I extracted subsets of the data and ran alignments and consulted the illumina adapter document to try to figure it out: http://i.imgur.com/XdEViUG.png
The alignments make the size of the adapter look variable, but present in only about 90% of the data. So I consulted the internet which told me to run a fastQC and find overreppresented sequences (in a sample of 100,000 raw reads): http://i.imgur.com/1afpYIO.png
Which seems to return a lot of different adapters. I realize that this is actually quite simple, I just do not have the experience to know exactly what to trim. Using a NGS toolbox for small RNA processing, I can remove both 5' and/or 3' adapters if I know the sequence (or where it starts for 3'). Running this code on one of my datasets reduced it to 82%: http://i.imgur.com/wf8hkGq.png
Any advice would be appreciated!
Take a look at this thread: small RNA-seq pipelines
BBMap mentioned in the thread comes with a comprehensive adapters.fa file in the "resources" directory that covers all common illumina kits.
Thanks for the reply. I do have the Illumina lists. I guess what I'm struggling with is how to decide what to trim, because when I tried to trim what I thought was the correct primer, it wasn't present in ~18% of my sequences.
You could try sRNAWorkbench, it has some predefined illumina trueseq adapters for smallrna. Generally smRNA analysis requires only single end data, 3' adapter removal should be fine