Question

small RNA-Seq analysis and doubts

0

Entering edit mode

3.4 years ago

Sergio • 0

Hi all,

I am analyzing for the first time small RNA-Seq data. The goal, at the end, is to get counts of miRNA, but also other non coding RNA-Seq would be interesting.

Reads are from human, paired-end. I have in total almost 50 samples.

I have some doubts, regarding percentage of reads mapped (between 25% and 45% per sample) and percentage of reads assigned by using featureCounts (very low, around 5-10%).

I will tell you step by step how I performed the analysis, hoping some of you can tell me if there is something wrong. I really think some mistakes could have been done, so I am in "listening" mode :D

Also, on the bottom, I will copy the output of a STAR log and of a featureCounts log.

THANKS A LOT FOR ANY FEEDBACK.

Sergio

FIRST STEP: trimming by using Trim Galore 0.6.6 trim_galore --output /path/trimmed_reads --fastqc --paired --length 18 --small_rna --cores 6 reads_1.fq.gz reads_2.fq.gz

(default adapters used by Trim Galore are fine. I checked and it's exactly what they used for the experiment) Results will be at this point trimmed_reads_1.fq.gz and trimmed_reads_2.fq.gz

SECOND STEP: alignment by using STAR 2.7.5a

STAR --genomeDir /genome_reference/GRCh38/STAR_2.7_indices --readFilesIn trimmed_reads_1.fq trimmed_reads_2.fq --outFileNamePrefix /path_to_output_folder/aligned_reads --readFilesCommand zcat --outFilterMismatchNoverLmax 0.05 --outFilterMatchNmin 16 --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --alignIntronMax 1 --outSAMtype BAM SortedByCoordinate --runThreadN 6 Results will be aligned_reads.bam

THIRD STEP, part a: counting by featureCount 1.6.0 to detect miRNA

featureCounts -t miRNA -p -g 'Name' -a /path/to/miRbase/v22.1/hsa.gtf -o /path/to/output/folder/miRNA_counts.txt aligned_reads_miRNA.bam

THIRD STEP, part b: counting by featureCount 1.6.0 to detect all small noncoding RNA

featureCounts -t miRNA -p -a /path/to/gencode.v36.annotation_hg38_where_I_selected_only_small_RNA.gtf -o /path/to/output/folder/miRNA_counts.txt aligned_reads_ncRNA.bam

HERE THE STAR LOG OF A SAMPLE AND THE featureCounts Log for cases a and b, of the same sample:

star log: Started job on | Jul 07 13:22:08 Started mapping on | Jul 07 13:26:08 Finished on | Jul 07 13:37:19 Mapping speed, Million of reads per hour | 157.02

                      Number of input reads |       29267386
                  Average input read length |       58
                                UNIQUE READS:
               Uniquely mapped reads number |       8418899
                    Uniquely mapped reads % |       28.77%
                      Average mapped length |       49.10
                   Number of splices: Total |       75064
        Number of splices: Annotated (sjdb) |       75064
                   Number of splices: GT/AG |       74264
                   Number of splices: GC/AG |       535
                   Number of splices: AT/AC |       73
           Number of splices: Non-canonical |       192
                  Mismatch rate per base, % |       0.27%
                     Deletion rate per base |       0.00%
                    Deletion average length |       1.00
                    Insertion rate per base |       0.00%
                   Insertion average length |       1.11
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |       13873946
         % of reads mapped to multiple loci |       47.40%
    Number of reads mapped to too many loci |       6847549
         % of reads mapped to too many loci |       23.40%
                              UNMAPPED READS:
 Number of reads unmapped: too many mismatches |       41163
   % of reads unmapped: too many mismatches |       0.14%
        Number of reads unmapped: too short |       1456
             % of reads unmapped: too short |       0.00%
            Number of reads unmapped: other |       84373
                 % of reads unmapped: other |       0.29%
                              CHIMERIC READS:
                   Number of chimeric reads |       0
                        % of chimeric reads |       0.00%

featureCounts log case a:

||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||              Paired-end : yes                                              ||
||         Strand specific : no                                               ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
||          Chimeric reads : counted                                          ||
||        Both ends mapped : not required                                     ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\\
||                                                                            ||

|| Load annotation file /path ... ||
||    Features : 2883                                                         ||
||    Meta-features : 2652                                                    ||
||    Chromosomes/contigs : 24                                                ||
||                                                                            ||
|| Process BAM file /path. ||
||    Paired-end reads are included.                                          ||
||    Assign fragments (read pairs) to features...                            ||
||                                                                            ||
||    WARNING: reads from the same pair were found not adjacent to each       ||
||             other in the input (due to read sorting by location or         ||
||             reporting of multi-mapping read pairs).                        ||
||                                                                            ||
||    Read re-ordering is performed.                                          ||
||                                                                            ||
||    Total fragments : 96167215                                              ||
||    Successfully assigned fragments : 6053052 (6.3%)                        ||
||    Running time : 9.35 minutes                                             ||


-----------------------------------------------------------------------------
-----------------------------------------------------------------------------

featureCounts log case b:

-----------------------------------------------------------------------------
||                 Threads : 1                                                ||
||                   Level : meta-feature level                               ||
||              Paired-end : yes                                              ||
||         Strand specific : no                                               ||
||      Multimapping reads : not counted                                      ||
|| Multi-overlapping reads : not counted                                      ||
||   Min overlapping bases : 1                                                ||
||                                                                            ||
||          Chimeric reads : counted                                          ||
||        Both ends mapped : not required                                     ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file /path/ ... ||
||    Features : 7506                                                         ||
||    Meta-features : 7506                                                    ||
||    Chromosomes/contigs : 25                                                ||
||                                                                            ||
|| Process BAM file /path/... ||
||    Paired-end reads are included.                                          ||
||    Assign fragments (read pairs) to features...                            ||
||                                                                            ||
||    WARNING: reads from the same pair were found not adjacent to each       ||
||             other in the input (due to read sorting by location or         ||
||             reporting of multi-mapping read pairs).                        ||
||                                                                            ||
||    Read re-ordering is performed.                                          ||
||                                                                            ||
||    Total fragments : 96167215                                              ||
||    Successfully assigned fragments : 6758205 (7.0%)                        ||
||    Running time : 8.82 minutes

nc RNA-Seq small miRNA-Seq • 1.4k views

ADD COMMENT • link 3.4 years ago by Sergio • 0

1

Entering edit mode

Hi Sergio, Concerning the low unique mapping rate, if you look carefully, you will see that almost all your reads are mapped (>99%), but many of them map to multiple loci. It happens a lot with tRNAs fragments and other small RNA species so it is not unexpected in a smallRNA-seq experiment.

Now concerning the output of featureCounts, you are discaring multimapping reads from the start (leaving you with 28% of total reads that are uniquely mapped to work with), then you only count them on miRNAs, ending up with about 7% reads assigned. That means that about 1/4 of your uniquely mapped smallRNA-seq reads map on miRNA, which I think is not bad since there is a lot more than miRNAs in a small RNA fraction.

Of course, more quality control is needed before affirming that this data is fine (for instance, look in a genome browser if you see the expected expression patterns), but I don't see a big problem just from the mapping or read assignment rate.