regarding the kissplice2reftranscriptome main output:
for eg: I have:
TRINITY_DN921_c0_g2_i1 bcc_10004|Cycle_0|Type_0a True 100.0 202 TAT CAT ...
As is understandable the TAT is the reference codon and CAT is the alt. codon, and the "TAT" should be the 202-204 squences in the TRINITY_DN921_c0_g2_i1 sequence.
So I did this:
samtools faidx 02_Trinity.fasta TRINITY_DN9185_c0_g2_i1:202-204
TRINITY_DN921_c0_g2_i1:202-204
TAA
In other cases for eg.:
TRINITY_DN921_c0_g2_i1 bcc_10003|Cycle_0|Type_0a True 100.0 641 GGG GGC
TRINITY_DN9185_c0_g2_i1:639-641
GGC
Can somebody explain me what is the pattern there? Or how to find the exact position of the codon into the transcript?
Thank you,
All the formats we use in our pipeline (.bed, .psl) are 0-based, hence the SNP position we output in the final table is also 0-based.
If you want to use samtools faidx (which is 1-based), you should type :
You will obtain 5 nucleotides, the central position being the SNP (202 in 0-based is 203 in 1-based).
For your specific example, since the SNP is in the first position of the codon, the codon should correspond to the last 3 nt of these 5nt,
unless your ORF is on the minus strand, in which case your codon should correspond to the reverse complement of the first 3 nt.
I'm also having a hard time interpreting the output. I was wondering if you could give me a hand.
When the snp change is in the second position of the codon, then we can predict that the codon of interest is in the middle of the 5 nucleotides that you mentioned. But how can we find the position of change in the reference fasta file file when the snp is on the first or second position of the codon? Also, how would this read for the reverse complement?
I'm copying below some selected sections of my data.
I am not sure I understand your question. The column SNP_position of the main output file of k2rt should give you the position of the SNP in the reference.
The only tricky thing to remember is that this coordinate is 0-based.
If you simply need the position of the SNP in the reference, then you do not need to worry about the strand, or if the SNP is in the first, second or third position of the codon.
I'm unclear how to see the KisSplice2refTranscriptome output .tsv file. Can you explain the specific meaning of each column? If I have multiple samples, how do I know the type of base in each sample at each SNP position?
Hi Vincent,
I'm also having a hard time interpreting the output. I was wondering if you could give me a hand.
When the snp change is in the second position of the codon, then we can predict that the codon of interest is in the middle of the 5 nucleotides that you mentioned. But how can we find the position of change in the reference fasta file file when the snp is on the first or second position of the codon? Also, how would this read for the reverse complement?
I'm copying below some selected sections of my data.
TrinityID-Position-samtoolsfaidx #reslults-faidx #results-faidx-complemented #kissplice-position #Kissplice-codon1 #Kissplice-codon2 #SNP_position_change
TRINITY_DN14342_c0_g1_i1:778-782 AGGAA TTCCT 779 GAA GGA 2nd TRINITY_DN19222_c5_g1_i4:1331-1335 CCCGC GCGGG 1332 CCC CCT 3rd TRINITY_DN5938_c0_g1_i1:1977-1981 TCCGA TCGGA 1978 TCC TCT 3rd TRINITY_DN14441_c0_g3_i1:41-45 CGGCC GGCCG 42 CGA CGG 3rd TRINITY_DN19222_c5_g1_i4:1232-1236 GAGAA TTCTC 1233 GAG GAA 3rd TRINITY_DN14418_c0_g1_i1:955-959 GGGAA TTCCC 956 GGA GGG 3rd TRINITY_DN14441_c0_g3_i1:40-44 TCGGC GCCGA 41 CGA CAA 2nd
TRINITY_DN19222_c5_g1_i4:1288-1292 CAGAG CTCTG 1289 AAA AGA 2nd TRINITY_DN8529_c0_g1_i1:134-138 GCGCT AGCGC 135 CAT CGT 2nd
Best, Vanessa