Dear community,
I have paired-end miRNA datasets which I want to analyze and FASTQC results gave good stats to proceed, but because It's the first time I start analyzing this kind of samples, I noticed that it is extremely important in miRNA to remove 3' adapter contamination due to the fact of read small size. So because I mainly focused on other types of analyses which did not include adapter trimming I wanted to ask if the method I'm applying is right enough to make sure I'm removing the adapter.
Here's my adapter sequence:
5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG '3
I was planning to use miRdeep2 algorithm which currently has a trimming function; so I took the last 22nt from the adapter and applied the following command:
mapper reads.fastq -e -h -i -j -k CTCGTATGCCGTCTTCTGCTTG -l 18 -m -p genome -s reads.fa -t reads_vs_genome.arf -v -u -n
After this step I checked the mapping results and I was quite surprised because the mapping seems that fail:
mapping reads to genome index
#reads processed: 12372857
#reads with at least one reported alignment: 537965 (4.35%)
#reads that failed to align: 11809642 (95.45%)
#reads with alignments suppressed due to -m: 25250 (0.20%)
Reported 660431 alignments to 1 output stream(s)
trimming unmapped nts in the 3' ends
Mapping statistics
#desc total mapped unmapped %mapped %unmapped
total: 199690475 109878864 89811611 0.550 0.450
seq: 199690475 109878864 89811611 0.550 0.450
As you can see something went wrong with the mapping ( ¿¿95.5% fail to align?? ) and I don't know what it is...
I was thinking mainly in 2 possible issues:
1.- Adaptor trimming failed
2.- As miRdeep2 does not map paired-end data at once ( authors suggest to treat as single end ), maybe this is influencing in mapping results
I ask if someone experimented same issues and maybe could help clarify this
Thanks!
I have just a short question about your data: How do paired-end reads for microRNAseq look like? I mean... the mature microRNA sequence is about 24nt long. Do the pairs completely overlap each other? Is this protocol strand specific in the end?
No, the pairs do not overlap completely each other and is not strand specific protocol.
why did you take the last 22? the tail of the adapter is the part you'd least likely see intact. try the whole thing
Thanks for the suggestion! I tried and worked fine :)
I am hearing paired end seq for miRNA for the first time. If it is a publicly available data that you are analysing, then can you share the link ?
I'm sorry this is data not published yet so you will have to wait till the publication comes out