Dear all,
I've been told 75bp paired end sequencing was performed on recent RNA. FastQC says sequence lengths are 76bp, as does an awk script I ran on my .fastq files. Can anyone explain why the sequencing centre said 75bo was performed when I'm counting 76bp? Sequencer was a nextseq500 with Truseq mRNA enrichment kit.
For Trimmomatic, what are the best adaptor sequence .fa file to use for trimming? Ther seem to be 3 versions for Truseq paired end reads. Is there a way I can tell what Illumina adaptors were used? I am somewhat aware that some Illumina platforms remove adaptors automatically? Does this include Nextseq 500?
Basic stuff, but trying to recap after a while away.. Thanks.
Genomax,
Many thanks, this is an informative answer. Base pairs range from 31-76bp in size. Don't all seem to be the one length. I'd assume they have been trimmed based on your statements. I've emailed the sequencing provider once more, should know soon. Do most people leave that +1bp in and call it 75,100,150 bp end sequencing by default as it is (when actually 76,101,151)?
As long as your inserts were longer than 76 bp that extra base pair should reflect real valid sequence. It sounds like you must have some inserts that are shorter than 75 bp since some reads appear to have been trimmed. If you are aligning to a reference it should be ok to leave that base in.