I am trying to improve my bioinformatics skills, and currently, I am working on obtaining raw count (tables counts) from miRNA-seq experiments in GEO. Both experiments provide downloadable count tables, but I want to generate the count tables myself from the sequences.
The issue is that the QC reports do not include information about the adapters. However, according to the articles associated with each experiment, adapter trimming was performed. Could someone guide me on how I can try to identify and remove them?
While illumina adapter sequences are easy to find (there is a PDF you can look for on net), miRNA adapters are specific for kit being used. These generally ligate to 3'end of miRNA directly. You will need to identify kit and then look for the adapter sequences in technical documentation for the kit.
It may also be possible to do a multiple sequence alignment of a few reads to see if you can identify that adapter. Keep in mind that every read in your dataset may not have the adapter. If the adapter is missing then that may not be a real miRNA.
If the submissions say that adapters were trimmed (if the reads are between 20-30 bp that may be true) you can move on to alignment and counting.
I would only add that it often makes sense to do several rounds of adapter trimming, and check (eg with fastqc) that no adapters remain. This is particularly true for miRNA /sRNA sequencing data.
I would only add that it often makes sense to do several rounds of adapter trimming, and check (eg with fastqc) that no adapters remain. This is particularly true for miRNA /sRNA sequencing data.