Hi guys,
For the first time, we used UMIs for the RNAseq. So I don't know exactly how to deal with that. I would really appreciate it if you could help me.
We used SMARTer Stranded Total RNA-Seq Kit v3 for the library preparation. And the manual show that
'The first 8 nt of the second sequencing read (Read 2) are UMIs (dark purple) followed by 3 nucleotides of UMI-linker (shown as NNN) and 3 nucleotides derived from the Pico v3 SMART UMI Adapter (shown as XXX).'
And I would like to try as below (8nt UMI):
umi_tools extract -I pair.1.fastq.gz --bc-pattern=NNNNNNNN \
--read2-in=pair.2.fastq.gz --stdout=processed.1.fastq.gz \
--read2-out=processed.2.fastq.gz
Is it correct? or should I use the linker and adapter? (14nt)
umi_tools extract -I pair.1.fastq.gz --bc-pattern=NNNNNNNNNNNNNN \
--read2-in=pair.2.fastq.gz --stdout=processed.1.fastq.gz \
--read2-out=processed.2.fastq.gz
Please help me to figure out. Thanks a lot.
Kim
P.S. And our fastq file of second reads as below:
@NB551656:25:H3N2YBGXK:1:11101:17925:1178 2:N:0:4
GTCATGAACGAGTCAGGCCAAGGGCATCAATTGCCCGTCACCGGAAGGCGCATTCTACGTCTACCCGTCCTGCGCC
+
AAAAAEEE////A/EEEEEEEEEEEEEEEEEEEE<EEEEAEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEAEAE
@NB551656:25:H3N2YBGXK:1:11101:6227:1179 2:N:0:4
GTGGGTTCCTTTGGTCTTGTTGCGTACCTGGAGAACGGAAGAGCGTCGTGTAGGGAAATAGTGTAAGTCCAAGTGT
+
AAAAAEEA/A///EE//EE/EE/AEEE<E//E/E/AE/EE//E6/EE/E/EE/E<E<E/E/E/EAEE</6AE///A
@NB551656:25:H3N2YBGXK:1:11101:14235:1180 2:N:0:4
ACTAAGCGGTGGGGTGATCGCCGAGAGCAAAGGTAAGGCTAAGAAAGGAAGACCAGGTTGGAGCCTTGAGAAAAAT
TakaraBio tells you what to do with these in their user manual (page 25):
They also appear to make some software available to do this here: https://www.takarabio.com/products/next-generation-sequencing/bioinformatics-tools/cogent-ngs-analysis-pipeline
Thanks a lot for your reply. I want to use UMIs (8nt) but maybe trim linker (3nt) and adapter (3nt). cogent software is not best option for us, because we bought another software. Is there any option for the keep 8nt but trim 6nt in umi_tools?