Hi there, I am currently taking a computational genomics undergraduate course and I am very new to computational biology. I am using RapMap for my final independant project. I am attempting to map single end reads to the Physcomitrella patens transcriptome and use deseq2 to look at differential expression. When I run the script to create the Quasiindex it runs in roughly 40 seconds and outputs no errors. However, the log file does not show the index being completed. This is the output I get:
[2019-12-16 14:19:47.094] [jointLog] [info] [Step 1 of 4] : counting k-mers
Elapsed time: 1.8452s
[2019-12-16 17:17:03.968] [jointLog] [warning] Removed 230 transcripts that were sequence duplicates of indexed transcripts.
[2019-12-16 17:17:03.968] [jointLog] [warning] If you wish to retain duplicate transcripts, please use the `--keepDuplicates` flag
[2019-12-16 17:17:03.969] [jointLog] [info] Replaced 36,868 non-ATCG nucleotides
[2019-12-16 17:17:03.969] [jointLog] [info] Clipped poly-A tails from 6 transcripts
[2019-12-16 17:17:03.973] [jointLog] [info] Building rank-select dictionary and saving to disk
[2019-12-16 17:17:03.977] [jointLog] [info] done
Elapsed time: 0.00412543s
[2019-12-16 17:17:03.989] [jointLog] [info] Writing sequence data to file . . .
[2019-12-16 17:17:04.027] [jointLog] [info] done
Elapsed time: 0.0385376s
[2019-12-16 17:17:04.099] [jointLog] [info] Building 32-bit suffix array (length of generalized text is 62,396,569)
[2019-12-16 17:17:04.221] [jointLog] [info] Building suffix array . . .
success
saving to disk . . . done
Elapsed time: 0.151399s
done
Elapsed time: 6.92052s
^M^Mprocessed 0 positions^M^Mprocessed 1,000,000 positions^M^Mprocessed 2,000,000 positions^M^Mprocessed 3,000,000 positions^M^Mprocessed 4,000,000 positions^M^Mprocessed 5,000,000 positions^M^Mprocessed 6,000,000 positions^M^Mprocessed 7,000,000 positions^M^Mprocessed 8,000,000 positions^M^Mprocessed 9,000,000 positions^M^Mprocessed 10,000,000 positions^M^Mprocessed 11,000,000 positions^M^Mprocessed 12,000,000 positions^M^Mprocessed 13,000,000 positions^M^Mprocessed 14,000,000 positions^M^Mprocessed 15,000,000 positions^M^Mprocessed 16,000,000 positions^M^Mprocessed 17,000,000 positions^M^Mprocessed 18,000,000 positions^M^Mprocessed 19,000,000 positions^M^Mprocessed 20,000,000 positions^M^Mprocessed 21,000,000 positions^M^Mprocessed 22,000,000 positions^M^Mprocessed 23,000,000 positions^M^Mprocessed 24,000,000 positions^M^Mprocessed 25,000,000 positions^M^Mprocessed 26,000,000 positions^M^Mprocessed 27,000,000 positions^M^Mprocessed 28,000,000 positions^M^Mprocessed 29,000,000 positions^M^Mprocessed 30,000,000 positions^M^Mprocessed 31,000,000 positions^M^Mprocessed 32,000,000 positions^M^Mprocessed 33,000,000 positions^M^Mprocessed 34,000,000 positions^M^Mprocessed 35,000,000 positions^M
^Mprocessed 36,000,000 positions^M^Mprocessed 37,000,000 positions^M^Mprocessed 38,000,000 positions^M^Mprocessed 39,000,000 positions^M^Mprocessed 40,000,000 positions^M^Mprocessed 41,000,000 positions^M^Mprocessed 42,000,000 positions^M^Mprocessed 43,000,000 positions^M^Mprocessed 44,000,000 positions^M^Mprocessed 45,000,000 positions^M^Mprocessed 46,000,000 positions^M^Mprocessed 47,000,000 positions^M^Mprocessed 48,000,000 positions^M^Mprocessed 49,000,000 positions^M^Mprocessed 50,000,000 positions^M^Mprocessed 51,000,000 positions^M^Mprocessed 52,000,000 positions^M^Mprocessed 53,000,000 positions^M^Mprocessed 54,000,000 positions^M^Mprocessed 55,000,000 positions^M^Mprocessed 56,000,000 positions^M^Mprocessed 57,000,000 positions^M^Mprocessed 58,000,000 positions^M^Mprocessed 59,000,000 positions^M^Mprocessed 60,000,000 positions^M^Mprocessed 61,000,000 positions^M^Mprocessed 62,000,000 positions
[2019-12-16 17:17:54.295] [jointLog] [info] khash had 58,255,238 keys
[2019-12-16 17:17:54.296] [jointLog] [info] saving hash to disk . . .
[2019-12-16 17:17:58.867] [jointLog] [info] done
Elapsed time: 4.5708s
_END_ [ indexscript.sh ]: Mon Dec 16 17:17:59 EST 2019
_ESTATUS_ [ quasiindex ]: 0
_END_ [ quasiindex ]: Mon Dec 16 17:17:59 EST 2019
It seems as though the script stops after step 1 but does not say why it does so. I attempted to map using the index it created and RapMap was able to run the mapping script. For analysis I plan to use Deseq2 and so I need to turn the RapMap .bam files into count data. When I do this, no reads are counted. This leads me to believe there is some issue with the index creation which then alters the mapping and then the count transformation after that. Has anyone seen this issue before or know what the issue might be? Thank you
It looks as though the index is being created properly. What is the output of the mapping phase? Have you looked at all at the SAM file being produced? Does that contain records that look like valid mappings?
Thank you for you speedy reply! That is good to hear. I am outputting SAM files but converting them to BAM files within the script. The SAM files do look like valid mappings. I was concerned about the "step 1 of 4" line in the script and not getting steps 2-4. I have gone ahead to try and get count data from the mapping output and it has worked so far. Which I guess means the index and the mapping was done correctly. If the index creation was successful is there a reason the log file is output in this way?