I have a a query regarding STAR alignment. I used the following commands to convert the BAM files to fastq (as they were some issues while using cufflinks)
samtools sort -n file.bam > file_sort.bam (sorted the file)
bedtools bamtofastq -i file_sort.bam -fq file_R1.fq -fq2 file_R2.fq (converted bam to fastq)
I further did the alignment using STAR where i used the following command
STAR --genomeDir star-genome --readFilesIn file_R1.fq file_R2.fq --runThreadN 6 --outFileNamePrefix file
My main issue is that i am getting very low unique alignment which is 8% to 15%. The output of one of the file looks like the following Started job on | Apr 18 09:39:59 Started mapping on | Apr 18 09:43:53 Finished on | Apr 18 10:49:50 Mapping speed, Million of reads per hour | 125.36
Number of input reads | 137795751
Average input read length | 200
UNIQUE READS:
Uniquely mapped reads number | 20600304
Uniquely mapped reads % | 14.95%
Average mapped length | 196.65
Number of splices: Total | 7843335
Number of splices: Annotated (sjdb) | 0
Number of splices: GT/AG | 7766279
Number of splices: GC/AG | 35700
Number of splices: AT/AC | 3805
Number of splices: Non-canonical | 37551
Mismatch rate per base, % | 0.42%
Deletion rate per base | 0.02%
Deletion average length | 1.42
Insertion rate per base | 0.01%
Insertion average length | 1.58
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 53570046
% of reads mapped to multiple loci | 38.88%
Number of reads mapped to too many loci | 3298118
% of reads mapped to too many loci | 2.39%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 43.76%
% of reads unmapped: other | 0.02%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
Could anyone please suggest how I can improve my alignment quality as most of my data shows reads unmapped :too short?
It will be great if I can get some expert suggestion.
Thanks
Prasoon
Looks like you have a lot of short reads in your dataset.
Ya I want to know if we can align these reads or they are all wasted. Is there a sequencing problem?
How short is short?
Try decreasing
--outFilterMatchNminOverLread
.For posterity sake, take a handful of the unmapped reads and blastn them.