Sequencing (+1bp in length?) and adaptor trimming.
2
0
Entering edit mode
5.7 years ago
Biogeek ▴ 470

Dear all,

I've been told 75bp paired end sequencing was performed on recent RNA. FastQC says sequence lengths are 76bp, as does an awk script I ran on my .fastq files. Can anyone explain why the sequencing centre said 75bo was performed when I'm counting 76bp? Sequencer was a nextseq500 with Truseq mRNA enrichment kit.

For Trimmomatic, what are the best adaptor sequence .fa file to use for trimming? Ther seem to be 3 versions for Truseq paired end reads. Is there a way I can tell what Illumina adaptors were used? I am somewhat aware that some Illumina platforms remove adaptors automatically? Does this include Nextseq 500?

Basic stuff, but trying to recap after a while away.. Thanks.

RNA-Seq sequencing adapters • 2.1k views
ADD COMMENT
1
Entering edit mode
5.7 years ago
GenoMax 147k

Last base in Illumina sequencing lacks phasing information. As a result some providers choose to do n+1 sequencing to ensure you get n bases of data. In general you can leave that base in and no harm should come from it. But if you are the worrying kind then trim the 76th base off.

I am somewhat aware that some Illumina platforms remove adaptors automatically?

Not necessarily. If your data was processed via BaseSpace (and if the sequence provider has chosen to scan/trim your data) then adpaters will be removed. One way to check is to see if every read is of length 76 (easy via FastQC). If they are not then your data has likely been trimmed.

ADD COMMENT
0
Entering edit mode

Genomax,

Many thanks, this is an informative answer. Base pairs range from 31-76bp in size. Don't all seem to be the one length. I'd assume they have been trimmed based on your statements. I've emailed the sequencing provider once more, should know soon. Do most people leave that +1bp in and call it 75,100,150 bp end sequencing by default as it is (when actually 76,101,151)?

ADD REPLY
0
Entering edit mode

As long as your inserts were longer than 76 bp that extra base pair should reflect real valid sequence. It sounds like you must have some inserts that are shorter than 75 bp since some reads appear to have been trimmed. If you are aligning to a reference it should be ok to leave that base in.

ADD REPLY
0
Entering edit mode
5.7 years ago
JC 13k

a) ask the sequencing provider

b) the ones used in the library construction, repeat a)

c) FastQC can guess adapters is they are in the reads, but to be sure a)

d) Yes, recent pipelines remove adapters after base calling, repeat a)

c) Could be, depends on the software, repeat a)

ADD COMMENT

Login before adding your answer.

Traffic: 1918 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6