Hello all,
Probably a simple fix but we all have to start somewhere. I am trying to figure out how to align reads to a transcriptome (Trinity generated) using STAR and currently doing some troubleshooting. I ran an alignment with just one of my samples (sample was included in generated transcriptome). The Average input read length was 141 (which intuitively to me should not lead to 99% reads being too short as the output says). These were originally 150bp sequenced.
First was to build the index
Slurm command = ... wrap="STAR --runThreadN 20 --runMode genomeGenerate --genomeDir ...path_to_index --genomeFastaFiles Trinity.fasta --genomeSAindexNbases 14"
Then to align
Slurm command = ... --wrap="STAR --readFilesCommand zcat --readFilesIn <in1> <in2> --genomeDir <.../index> --runThreadN 20 --outSAMtype BAM SortedByCoordinate --outSAMunmapped Within"
Any thoughts? Could this be a problem with building indices or the actual alignment?
Number of input reads | 42852270
Average input read length | 141
UNIQUE READS:
Uniquely mapped reads number | 217
Uniquely mapped reads % | 0.00%
Average mapped length | 126.74
Number of splices: Total | 0
Number of splices: Annotated (sjdb) | 0
Number of splices: GT/AG | 0
Number of splices: GC/AG | 0
Number of splices: AT/AC | 0
Number of splices: Non-canonical | 0
Mismatch rate per base, % | 7.60%
Deletion rate per base | 0.00%
Deletion average length | 0.00
Insertion rate per base | 0.01%
Insertion average length | 1.00
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 267726
% of reads mapped to multiple loci | 0.62%
Number of reads mapped to too many loci | 24294
% of reads mapped to too many loci | 0.06%
UNMAPPED READS:
Number of reads unmapped: too many mismatches | 0
% of reads unmapped: too many mismatches | 0.00%
Number of reads unmapped: too short | 42555398
% of reads unmapped: too short | 99.31%
Number of reads unmapped: other | 4635
% of reads unmapped: other | 0.01%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
Please use the formatting bar (especially the
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.Also, add relevant tags so people can find your questions more easily.
STAR
andTrinity
are relevant tags here, but the only tag added isalignment
, which is too generic and not at all helpful. Please invest a decent amount of effort in your question.Note that when STAR says "too short" it doesn't literally mean that. It just means it didn't map. Are you totally sure this is the right reference?
Try to map with a different tool to see if you get different results. Since you're mapping to a transcriptome, you could try BWA.
Did you use BUSCOs for transcriptome assessment ? You can see whether the trinity fasta file is assembled well using BUSCOs. Moreover, you can use the other samples to align against reference. Those may give some ideas.
What have you done before aligning the reads to the reference genome?