RNA Seq analysis
0
0
Entering edit mode
2.9 years ago

Dear all

I have tried to map raw RNA Seq (paired fastq) sequence with the reference fungal genome using STAR. My Log.final.out file looks like:

Number of input reads | 25763319 Average input read length | 202 UNIQUE READS: Uniquely mapped reads number | 6701254 Uniquely mapped reads % | 26.01% Average mapped length | 200.52 Number of splices: Total | 1409119 Number of splices: Annotated (sjdb) | 1326594 Number of splices: GT/AG | 1377189 Number of splices: GC/AG | 11322 Number of splices: AT/AC | 423 Number of splices: Non-canonical | 20185 Mismatch rate per base, % | 0.21% Deletion rate per base | 0.00% Deletion average length | 1.44 Insertion rate per base | 0.00% Insertion average length | 1.18 MULTI-MAPPING READS: Number of reads mapped to multiple loci | 76397 % of reads mapped to multiple loci | 0.30% Number of reads mapped to too many loci | 58765 % of reads mapped to too many loci | 0.23% UNMAPPED READS: Number of reads unmapped: too many mismatches | 0 % of reads unmapped: too many mismatches | 0.00% Number of reads unmapped: too short | 18780473 % of reads unmapped: too short | 72.89% Number of reads unmapped: other | 147037 % of reads unmapped: other | 0.57% CHIMERIC READS: Number of chimeric reads | 0 % of chimeric reads | 0.00%

I want to know why only 26.01% of read Uniquely mapped with the reference genome? Is it due to the adapter sequence contamination (not provided by the service provider). If so how to identify and remove them.

I am a beginner in RNA Seq analysis and any suggestion in this regard will be highly useful in my work.

Thanks

reads percentage mapped Uniquely Low of • 1.1k views
ADD COMMENT
0
Entering edit mode

Run fastqc to know whether adapters are present.

ADD REPLY
0
Entering edit mode

Dear Sir

I have checked the fastqc results. It shows

Per Base Sequence Quality >35 Sequence Length Distribution 100 – 102 bp Adapter Content : Solid small RNA adapter very low or not available

Surprisingly multiqc_report showing the presence of adapter

enter image description here enter image description here

Do I needs to perform any preprocessing of these fastq sequences before performing the mapping? How to find the adapters ?

or do I needs to make any changes in the used parameters.

STAR --genomeDir index/ --runThreadN 16 --readFilesIn 1_1.fastq 1_2.fastq --outFileNamePrefix results/ --outSAMtype BAM SortedByCoordinate --outSAMattributes Standard

thanks

ADD REPLY
0
Entering edit mode

always assess the quality of your sequencing data first. You say you don't know what the adapter sequence is but it very well may be one of the standard Illumina sequences for which FastQC will recognize. Else, if the company deems this information proprietary you can certainly make an argument that they should at least trim adapters from your reads for you.

It might also help for interpreting your results to know what parameters you used when running STAR as these can impact how many of your reads end up mapped to the reference.

ADD REPLY
0
Entering edit mode

I found only 0.53% adapter sequence in my fastq data. Will it cause errors during mapping?

ADD REPLY

Login before adding your answer.

Traffic: 1982 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6