Entering edit mode
6.5 years ago
Moneeb Bajwa
▴
10
Hi,
I was wondering if the reads I am using say "Construction protocol: Agilent SureSelect Strand Specific RNA" can I just use TruSeq3 adapter files to trim? What about Nextera? I am using Trimmomatic. It was Illumina HiSeq 3000 which was used. Sorry if the question doesn't make sense, as I am new to this...
Thank you
Hello bajwa.m
For Agilent SureSelect read the manual Agilentmanual.
You can use following seq as adapter for trimming "CTGTCTCTTGATCACA".
For initial check (to know how may of your reads contains adapter)
use
grep "CTGTCTCTTGATCACA" input.fastq | wc -l
U can not use other adapter. If you find from library preparation what is the adapter first used, use grep command mentioned above and then only use that adapter for trimming.
I am not really understanding...when I use that grep command on one of the files I get 0 occurrences. These are the SRA sequences: https://www.ncbi.nlm.nih.gov/sra?linkname=bioproject_sra_all&from_uid=434667. I do get 34 occurrences in one of the files if I use just CTGTCTCTTGATC. Not sure how this works, please help! Also where did you get that particular adapter sequence from, as I could not find it in the link you gave.
I think you are first time working with NGS data.
You have to convert the SRA sequences to fastq format using fastq-dump. Then on the fastq files you can perform trimming. you start reading more post on biostar to get yourself going.
No I did fastq-dump
can u share the top 10 lines of fastq files
head "input.fastq"
Yes i just fixed my last comment; you can see it now
Use some ten bases "CTGTCTCT" and go ahead for trimming.
Below is the text from Manual:
MiSeq platform sequencing run setup and adaptor trimming guidelines Use the Illumina Experiment Manager (IEM) software to generate a custom primer Sample Sheet. Set up the run to include adapter trimming using the IEM Sample Sheet Wizard. When prompted by the wizard, select the Use Adapter Trimming option, and specify CTGTCTCTTGATCACA as the adapter sequence. This enables the MiSeq Reporter software to identify the adaptor sequence and trim the adaptor from reads.
Is this something that is usually done? Is it possible it is a different adapter?
In the manual you gave, I see it is for SureSelect QXT: https://www.agilent.com/cs/library/usermanuals/public/G9682-90000.pdf; but mine are SureSelect Strand Specific. Does that matter? Mine are also single-ended reads.
Do one thing run FastQC on your fastq files and check for the over represented sequences. If any such adapter is present in your sample , you ll get to know after ruuning fastqc. Good luck
OK thanks! The result was the following for overrepresented sequences: GATCGGAAGAGCACACGTCTGAACTCCAGTCACGAA (0.10780696988082418%) - TruSeq Adapter, Index 7 (97% over 35bp), and AATGATACGGCGACCACCGAGATCGGAAGAGCACAC (0.13940876934680813%) - Illumina Single End PCR Primer 1 (95% over 24bp). Does that make sense if it was Agilent SureSelect Protocol? These are also short reads of only 36bp, does that matter?