I'm running the Dropseq pipeline on a de-novo assembled diatom with gene calls from Maker. The pipeline suggests using STAR-aligner for the read mapping. The only problem is that I'm getting a REALLY low mapping.
Does anybody know if there are parameters I can adjust for this particular dataset in STAR
or another aligner designed to specifically address these types of issues?
Here is my command:
STAR --genomeDir star_index_extended --readFilesIn unaligned_mc_tagged_polyA_filtered.fastq --outFileNamePrefix star_ --runThreadN 16 --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --outFilterMatchNmin 50 --outReadsUnmapped Fastx
Here is the summary output:
bash-4.1$ cat star_Log.final.out
Started job on | May 10 14:58:11
Started mapping on | May 10 14:58:16
Finished on | May 10 20:57:36
Mapping speed, Million of reads per hour | 85.18
Number of input reads | 510139798
Average input read length | 57
UNIQUE READS:
Uniquely mapped reads number | 9914109
Uniquely mapped reads % | 1.94%
Average mapped length | 58.89
Number of splices: Total | 689950
Number of splices: Annotated (sjdb) | 0
Number of splices: GT/AG | 415996
Number of splices: GC/AG | 11503
Number of splices: AT/AC | 297
Number of splices: Non-canonical | 262154
Mismatch rate per base, % | 4.32%
Deletion rate per base | 0.07%
Deletion average length | 1.54
Insertion rate per base | 0.03%
Insertion average length | 1.64
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 11525318
% of reads mapped to multiple loci | 2.26%
Number of reads mapped to too many loci | 11342282
% of reads mapped to too many loci | 2.22%
UNMAPPED READS:
% of reads unmapped: too many mismatches | 0.00%
% of reads unmapped: too short | 84.30%
% of reads unmapped: other | 9.27%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
can you provide some stats on the input fastq file you are using?
What other stats would be useful to add here? I think the only one
STAR
outputs regarding this is:Average mapped length | 58.89
but I can run another tool as well.yes, but i mean more in the line of length of your input reads etc (eg. something that you would get from running fastQC or such)
Try setting --outFilterMatchNmin to 20 to see if you can get more mapping. However, that means it only requires 20 bases to map, which is pretty low.