miRNA analysis is new for me. I'm working on reads from a 55 cycle single-read sequencing run and I think I have a problem.. After pre processing steps (removing low quality reads, removing 5' and 3' adapters), 50% of the reads are 55 nucleotides long (that's mean around 1000000 on a total of 2000000).
I understand that these reads should be removed, as they can't be miRNA, but is it normal that length filtering implies removing such a number of reads? and to what could these reads correspond?
cutadapt (and adapter sequences which appear as over-expressed sequences in the FastQC results disappear after adapter trimming meaning it went well, right?)
I think the little similar question was posted here
where Ryan had given the reason,
I had suggested, in your case , I think it should be ...
##### remove the adapter
cutadapt --discard-untrimmed --minimum-length=20 --maximum-length=30 -a <adapter_sequence> In_seq.fastq > your_trimmed_file.fastq
####### download ribosomal and tRNA sequence and build its index
###### then remove also the reads mapping to ribosomal and tRNA sequences
bowtie --seedlen=23 --un output_file.fastq /path_to/bowtieindex/r_tRNA your_trimmed_file.fastq > /dev/null
your output_file.fastq should look better to allign.
Hi, if you did miRNA sequencing, you have to have about sequences with 20-23 bp in length after trimming, but you may have sequences with up to 35bp if you had small RNA-sequencing. those reads with unusual length (55bp) in your work can result from adaptor dimerization and have to remove, so during trimming, you can define a threshold that keep just sequences with 15-40 bp in length to get rid of unwanted sequences.
ADD COMMENT
• link
updated 2.8 years ago by
Ram
44k
•
written 10.1 years ago by
seta
★
1.9k
0
Entering edit mode
Dear Seta
I have some human non coding RNA-seq data.for getting diff-exp of miRNA,should I trim length between 18 to 30 befor starting? why Avg. Sequence length is 51,can I use these data for getting diff-exp of lncRNA too?
ADD REPLY
• link
updated 2.8 years ago by
Ram
44k
•
written 8.3 years ago by
Edalat
▴
30
I actually just found an answer: these 50% of reads correspond to phiX contamination! I didn't suspect that because I thought it was "rare" after demultiplexing, but it seems that it's a well known problem.
which adapter trimming did you use ?
cutadapt (and adapter sequences which appear as over-expressed sequences in the FastQC results disappear after adapter trimming meaning it went well, right?)