Question

miRNA fastqc sequence length distribution with UMI

1

Entering edit mode

5.1 years ago

maria2019 ▴ 250

I have single ended 75 bp miRNA reads (Quiagene miRNA kit) reads with UMI.

The fastqc report shows high peak at the 83-84 bp and illumina universal adaptor.

After removing the 5-3' adaptor ((5’-3’) AACTGTAGGCACCATCAAT) and also reads lower than 17bp with cutadapt, The sequence distribution peak is on 22-23.

I know that miRNA should be around 18-22 and UMI length 12. Doesn't it mean that I should see a peak around 30-34?

The code that I used was:

cutadapt -a AACTGTAGGCACCATCAAT --minimum-length 17 -o tri.fastq sample.fastq

miRNA fastqc cutadapt trimming Qiagene • 1.5k views

ADD COMMENT • link updated 5.1 years ago by swbarnes2 14k • written 5.1 years ago by maria2019 ▴ 250

score 1 · Answer 1 · 2019-10-10

1

Entering edit mode

5.1 years ago

swbarnes2 14k

How did you process your fastq? bcl2fastq can be configured to remove the UMI from the read and put it in the read name; are you sure that wasn't done?

ADD COMMENT • link 5.1 years ago by swbarnes2 14k

0

Entering edit mode

I believe not. The head of fastq file is as follow:

@NB551007:45:HNKVLBGX5:1:11101:18335:1071 1:N:0:GCCAAT CTGGANGCGAGCCAACTGTAGGCACCATCAATNCCGTGCCCTCNAGATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAAT + AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEE#EAEEEEEEEE#EEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEE @NB551007:45:HNKVLBGX5:1:11101:5844:1072 1:N:0:GCCAAT CTGTANGCACCATCAATCGACGTGAACAGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGCCGT + AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEAEEE< AE/AEEEEEEEEEEEEE @NB551007:45:HNKVLBGX5:1:11101:23470:1072 1:N:0:GCCAAT CGTGGNGAGGAACAATTCTGAGAACTGTAGGCACCATCAATGAACTCGAACCCAGATCGGAAGAGCACACGTCTGAACTCCAGT + AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEE @NB551007:45:HNKVLBGX5:1:11101:12496:1074 1:N:0:GCCAAT TCGCTNCGATCTATTGAAAGTCGGCCCTCGACACAAGGGTTTGTAACTGTAGGCACCATCAATTCCCTTATTGCCAGATCGGAA + AAAAA#EEAEAEEEEE6EEEEE/EEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEE

In a downstream analysis I want to use UMI-tools for deduplication. I should actually have the UMI name on the read name to be able to work on it. I searched and looks like I can use fastp to remove the UMI from the read and move it to the read name.

Now my question would be once I have done that, for the trimming with cutadapt, should I remove reads higher than say 40 bp? Just keep 17-40 reads?

ADD REPLY • link 5.1 years ago by maria2019 ▴ 250