Question

Trimmomatic SureSelect protocol

0

Entering edit mode

6.9 years ago

Moneeb Bajwa ▴ 10

Hi,

I was wondering if the reads I am using say "Construction protocol: Agilent SureSelect Strand Specific RNA" can I just use TruSeq3 adapter files to trim? What about Nextera? I am using Trimmomatic. It was Illumina HiSeq 3000 which was used. Sorry if the question doesn't make sense, as I am new to this...

Thank you

rna-seq sequencing assembly • 4.6k views

ADD COMMENT • link 6.9 years ago by Moneeb Bajwa ▴ 10

0

Entering edit mode

Hello bajwa.m

For Agilent SureSelect read the manual Agilentmanual.

You can use following seq as adapter for trimming "CTGTCTCTTGATCACA".

For initial check (to know how may of your reads contains adapter)

use grep "CTGTCTCTTGATCACA" input.fastq | wc -l

U can not use other adapter. If you find from library preparation what is the adapter first used, use grep command mentioned above and then only use that adapter for trimming.

ADD REPLY • link 6.9 years ago by mks002 ▴ 220

0

Entering edit mode

I am not really understanding...when I use that grep command on one of the files I get 0 occurrences. These are the SRA sequences: https://www.ncbi.nlm.nih.gov/sra?linkname=bioproject_sra_all&from_uid=434667. I do get 34 occurrences in one of the files if I use just CTGTCTCTTGATC. Not sure how this works, please help! Also where did you get that particular adapter sequence from, as I could not find it in the link you gave.

ADD REPLY • link 6.9 years ago by Moneeb Bajwa ▴ 10

0

Entering edit mode

I think you are first time working with NGS data.

You have to convert the SRA sequences to fastq format using fastq-dump. Then on the fastq files you can perform trimming. you start reading more post on biostar to get yourself going.

ADD REPLY • link 6.8 years ago by mks002 ▴ 220

0

Entering edit mode

No I did fastq-dump

ADD REPLY • link 6.8 years ago by Moneeb Bajwa ▴ 10

1

Entering edit mode

can u share the top 10 lines of fastq fileshead "input.fastq"

ADD REPLY • link 6.8 years ago by mks002 ▴ 220

0

Entering edit mode

Yes i just fixed my last comment; you can see it now

ADD REPLY • link 6.8 years ago by Moneeb Bajwa ▴ 10

0

Entering edit mode

@DRR089573.1 J00158:10:H7CTLBBXX:1:1101:30594:1226 length=36
NTTGGGGGGAAGGTCTGGATCCAAGATGGTGATGAT
+DRR089573.1 J00158:10:H7CTLBBXX:1:1101:30594:1226 length=36
#<AAFJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJ
@DRR089573.2 J00158:10:H7CTLBBXX:1:1101:30695:1226 length=36
NCGTCATTGTCCCCTTGGCAGTGAGCAAAGGCCGTG
+DRR089573.2 J00158:10:H7CTLBBXX:1:1101:30695:1226 length=36
#<AAFJFFJJJJJJFFJFFJF<JAFJJJJFFFFJ<A
@DRR089573.3 J00158:10:H7CTLBBXX:1:1101:30756:1226 length=36
NCCGGATCCCATCTGAGAAGAAGTACACGCCAGGGG

ADD REPLY • link 6.8 years ago by Moneeb Bajwa ▴ 10

0

Entering edit mode

Use some ten bases "CTGTCTCT" and go ahead for trimming.

Below is the text from Manual:

MiSeq platform sequencing run setup and adaptor trimming guidelines Use the Illumina Experiment Manager (IEM) software to generate a custom primer Sample Sheet. Set up the run to include adapter trimming using the IEM Sample Sheet Wizard. When prompted by the wizard, select the Use Adapter Trimming option, and specify CTGTCTCTTGATCACA as the adapter sequence. This enables the MiSeq Reporter software to identify the adaptor sequence and trim the adaptor from reads.

ADD REPLY • link 6.8 years ago by mks002 ▴ 220

0

Entering edit mode

Is this something that is usually done? Is it possible it is a different adapter?

ADD REPLY • link 6.8 years ago by Moneeb Bajwa ▴ 10

0

Entering edit mode

In the manual you gave, I see it is for SureSelect QXT: https://www.agilent.com/cs/library/usermanuals/public/G9682-90000.pdf; but mine are SureSelect Strand Specific. Does that matter? Mine are also single-ended reads.

ADD REPLY • link 6.8 years ago by Moneeb Bajwa ▴ 10

1

Entering edit mode

Do one thing run FastQC on your fastq files and check for the over represented sequences. If any such adapter is present in your sample , you ll get to know after ruuning fastqc. Good luck

ADD REPLY • link 6.8 years ago by mks002 ▴ 220

0

Entering edit mode

OK thanks! The result was the following for overrepresented sequences: GATCGGAAGAGCACACGTCTGAACTCCAGTCACGAA (0.10780696988082418%) - TruSeq Adapter, Index 7 (97% over 35bp), and AATGATACGGCGACCACCGAGATCGGAAGAGCACAC (0.13940876934680813%) - Illumina Single End PCR Primer 1 (95% over 24bp). Does that make sense if it was Agilent SureSelect Protocol? These are also short reads of only 36bp, does that matter?

ADD REPLY • link 6.8 years ago by Moneeb Bajwa ▴ 10