Entering edit mode
11 months ago
mohammadhassanj
▴
260
Hi, After trimming the reads as discussed in below
Is mir-seq reads quality good (Fastqc report) for DE analysis?
I used bowtie1 for alignment, but 99% of the reads were not aligned, what is the problem?
bowtie -t -x ../bowtieIndex/GCA_000001405.15_GRCh38_no_alt_analysis_set -p 15 --chunkmbs 512 -1 trimmed_SRR.R1.fastq.gz -2 trimmed_SRR.R2.fastq.gz -S SRR.sam
Time loading reference: 00:00:00 Time loading forward index: 00:00:00 Time loading mirror index: 00:00:00 Seeded quality full-index search: 00:16:27
reads processed: 30829794
reads with at least one alignment: 110845 (0.36%)
reads that failed to align: 30718949 (99.64%)
Reported 110845 paired-end alignments
the fastqc report after trim :
If you have real miRNA data then you should only have short reads left after trimming. Take a selection of trimmed reads that did not align and blast them to make sure they are from right genome.
You should also use only read 1 for alignment. It does not make sense to use paired data with reads expected to be shorter than 30 bp.
Thank you so much after aligning with just one read (R1) the alignment finishes in 26 seconds! with the following log:
Do you think this result is acceptable? if yes what is the reason? can you explain more
If you have reads of 90-something bp that align to the genome then you don't have microRNAs but probably genomic DNA or other contamination. miRNAs are canonically 19-25bp long so it can't be miRNAs. So no, the results are suspicious.
Hi this is the output of
what do you think?
As noted by ATPoint miRNA are going to be small. So in theory you need to only keep those reads that have the specific miRNA library adapter (rest are not usable reads). This adapter will need to be trimmed followed by alignment.
Looks like you are working with public SRA data. What accession number is the data from?
Yes, The accession number for the mentioned data is SRR22399501
Based on the kit mentioned it looks like
AGATCGGAAGAGCACACGTCT
is the smRNA 3'-adapter. Can you see that in your reads? Only keep those reads that contain this adapter if you do. Trim the reads to remove this adapter and everything to 3'-end of that from Read 1 and then align.Thanks if I understand correctly first, as you suggest I use this command
Almost more than 99% of the reads were recovered
after that, I use cutadapt to trim the adapter from the reads
the fastqc result of the above commands was same below command
finally that the featueCount command log on this data was with just 22% assigned alignment
That appears reasonable. featureCounts will not count multi-mapped reads so you will need to take that in consideration when dealing with miRNA. You could try using a miRNA workflow.
Does it make sense that the low assign percentage (featureCounts) is due to the large amount of overrepresented sequences in the fastqc report?
This article in a bit old now (2016), but I think it provides a nice review of the challenges inherent to miRNAs alignment and provides a benchmark of different aligners and pipelines. I hope this will be useful:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4931105/
(Ziemann M, Kaspi A, El-Osta A. Evaluation of microRNA alignment techniques. RNA. 2016 Aug;22(8):1120-38. doi: 10.1261/rna.055509.115. Epub 2016 Jun 9. PMID: 27284164; PMCID: PMC4931105.)
I found it interesting that contrary to this article's suggestion, many miRNA pipelines are written using bowtie1 as an aligner after paper publication!!