Hello, everyone!
I couldn't find an explanation online or in the manuals for KisSplice and kissplice2reftranscriptome. The FAQ gave some explanations, but for me some things are still unclear.
First, here is example of the SNPs I got after running KisSplice/kissplice2reftranscriptome/KissDE:
TRINITY_DN12819_c0_g1_i1 bcc_346265|Cycle_0|Type_0a True 100 188 GTG GGG V G True False False False 100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|100.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0 C1_11|C2_9|C3_11|C4_5|C5_8|C6_4|C7_10|C8_8|C9_20|C10_11|C11_5|C12_6|C13_11|C14_5|C15_5|C16_4|C17_10|C18_4|C19_4|C20_6|C21_0|C22_0|C23_0|C24_0|C25_0|C26_0|C27_0|C28_0|C29_0|C30_0|C31_0|C32_0|C33_0|C34_0|C35_0|C36_0|C37_0|C38_0|C39_0|C40_0|C41_0|C42_0|C43_0|C44_0|C45_0|C46_0|C47_0|C48_0|C49_0|C50_0|C51_0|C52_0|C53_0|C54_0|C55_0|C56_0|C57_0|C58_0|C59_0|C60_0 C1_0|C2_0|C3_0|C4_0|C5_0|C6_0|C7_0|C8_0|C9_0|C10_0|C11_0|C12_0|C13_0|C14_0|C15_0|C16_0|C17_0|C18_0|C19_0|C20_0|C21_10|C22_6|C23_0|C24_0|C25_0|C26_0|C27_25|C28_11|C29_0|C30_0|C31_0|C32_0|C33_35|C34_12|C35_14|C36_11|C37_21|C38_6|C39_16|C40_3|C41_22|C42_12|C43_0|C44_0|C45_0|C46_0|C47_36|C48_15|C49_0|C50_0|C51_0|C52_0|C53_23|C54_11|C55_7|C56_1|C57_9|C58_6|C59_12|C60_2 True 1.73635024831182e-13 -1
TRINITY_DN12819_c0_g1_i1 bcc_346265|Cycle_1|Type_0a True 100 188 GTG GCG V A True False False False 100.0|100.0|100.0|100.0|57.14|50.0|62.5|61.54|100.0|100.0|100.0|100.0|100.0|100.0|31.25|100.0|66.67|50.0|100.0|100.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0 C1_11|C2_9|C3_11|C4_5|C5_8|C6_4|C7_10|C8_8|C9_20|C10_11|C11_5|C12_6|C13_11|C14_5|C15_5|C16_4|C17_10|C18_4|C19_4|C20_6|C21_0|C22_0|C23_0|C24_0|C25_0|C26_0|C27_0|C28_0|C29_0|C30_0|C31_0|C32_0|C33_0|C34_0|C35_0|C36_0|C37_0|C38_0|C39_0|C40_0|C41_0|C42_0|C43_0|C44_0|C45_0|C46_0|C47_0|C48_0|C49_0|C50_0|C51_0|C52_0|C53_0|C54_0|C55_0|C56_0|C57_0|C58_0|C59_0|C60_0 C1_0|C2_0|C3_0|C4_0|C5_6|C6_4|C7_6|C8_5|C9_0|C10_0|C11_0|C12_0|C13_0|C14_0|C15_11|C16_0|C17_5|C18_4|C19_0|C20_0|C21_8|C22_8|C23_2|C24_1|C25_8|C26_5|C27_0|C28_0|C29_7|C30_5|C31_0|C32_0|C33_0|C34_0|C35_0|C36_0|C37_6|C38_5|C39_13|C40_9|C41_11|C42_8|C43_6|C44_2|C45_8|C46_5|C47_0|C48_0|C49_7|C50_6|C51_0|C52_0|C53_0|C54_0|C55_0|C56_0|C57_9|C58_5|C59_8|C60_3 True 0 -0.8202
TRINITY_DN12819_c0_g1_i1 bcc_346265|Cycle_3|Type_0a True 100 188 GGG GCG G A True False False False 0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|0.0|55.56|42.86|0.0|0.0|0.0|0.0|100.0|100.0|0.0|0.0|0.0|0.0|100.0|100.0|100.0|100.0|77.78|54.55|55.17|25.0|66.67|60.0|0.0|0.0|0.0|0.0|100.0|100.0|0.0|0.0|0.0|0.0|100.0|100.0|100.0|100.0|50.0|54.55|60.0|40.0 C1_0|C2_0|C3_0|C4_0|C5_0|C6_0|C7_0|C8_0|C9_0|C10_0|C11_0|C12_0|C13_0|C14_0|C15_0|C16_0|C17_0|C18_0|C19_0|C20_0|C21_10|C22_6|C23_0|C24_0|C25_0|C26_0|C27_25|C28_11|C29_0|C30_0|C31_0|C32_0|C33_35|C34_12|C35_14|C36_11|C37_21|C38_6|C39_16|C40_3|C41_22|C42_12|C43_0|C44_0|C45_0|C46_0|C47_36|C48_15|C49_0|C50_0|C51_0|C52_0|C53_23|C54_11|C55_7|C56_1|C57_9|C58_6|C59_12|C60_2 C1_0|C2_0|C3_0|C4_0|C5_6|C6_4|C7_6|C8_5|C9_0|C10_0|C11_0|C12_0|C13_0|C14_0|C15_11|C16_0|C17_5|C18_4|C19_0|C20_0|C21_8|C22_8|C23_2|C24_1|C25_8|C26_5|C27_0|C28_0|C29_7|C30_5|C31_0|C32_0|C33_0|C34_0|C35_0|C36_0|C37_6|C38_5|C39_13|C40_9|C41_11|C42_8|C43_6|C44_2|C45_8|C46_5|C47_0|C48_0|C49_7|C50_6|C51_0|C52_0|C53_0|C54_0|C55_0|C56_0|C57_9|C58_5|C59_8|C60_3 True 0.000872411136611906 0.6704
I have several problems/questions regarding this output:
1) Should I consider these as 3 different SNPs at the same location, or these are 3 versions of the same SNP?
What confuses me are the identifiers bcc_346265|Cycle_0|Type_0a, bcc_346265|Cycle_1|Type_0a, bcc_346265|Cycle_3|Type_0a. The BCC part is the same, which makes sense since according to FAQ BCCs are a set of overlapping variations. The Cycle (the bubble identifier) part is different. How to interpret and filter such events? How do I decide which ones to keep?
2) The first two events bcc_346265|Cycle_0 and bcc_346265|Cycle_1 have the same counts, but different frequencies. Why is that? And do these counts represent real number of reads, or some normalised value? I assume that they are normalised with DESeq2 method, but I'm not sure.
3) The first two events have different p-value and different DeltaF. For some reason though, the event with the smallest DeltaF, biggest difference between conditions, has a p-value that is worse than the one with higher DeltaF. How to interpret this and how to decide between such events?
4) I would like to compare KisSplice and GATK. The results that are posted here based on the placement of KisSplice events on a Trinity assembly. I have an annotated genome assembly though, and would like to know whether it makes sense to use kissplice2reftranscriptome with CDS sequences from genome annotation software?