Question

Inferring undisclosed 5mer in proprietary SMARTer oligo sequence

0

Entering edit mode

16 months ago

kevin.stachelek ▴ 80

I am attempting to infer the identify of an unknown 5mer present in amplified fragments after first-strand synthesis using the smart-seq v4 kit. takara oligo diagram .

I want to amplify fragments using this oligo from the original reverse-transcribed products before illumina library preparation.

I am using the shortread bioconductor package to sample ~1e6 reads from a few hundred untrimmed single cell fastq pairs, then filtering to exclude poly-A or poly-T sequences and listing the most frequent subsequent 5mers following the known oligo sequence, AAGCAGTGGTATCAACGCAGAGTAC. I am finding an overrepresentation of GGGNN sequences. Is there some explanation for this pattern? Something to do with C:G percentages and repetitive elements which I'm not dealing with through this naive approach?

frequency of 5mers

smart-seq scrnaseq • 528 views

ADD COMMENT • link updated 16 months ago by Pei ▴ 220 • written 16 months ago by kevin.stachelek ▴ 80

score 0 · Answer 1 · 2023-08-01

If the reads (your untrimmed single cell fastq) come from illumina machines, 'G' may simply represent no signal.

"...2-channel SBS simplifies nucleotide detection by using two fluorescent dyes and two images to determine all four base calls. Images are taken of each DNA cluster using blue and green wavelength filter bands. Clusters seen in blue or green images are interpreted as C and T bases, respectively. Clusters observed in both blue and green images are flagged as A bases, while unlabeled clusters are identified as G bases."

https://www.illumina.com/science/technology/next-generation-sequencing/sequencing-technology/2-channel-sbs.html