Question

Variant detection in RNA seq datasets without a reference using kISSPLICE.

0

Entering edit mode

7.2 years ago

koushikponnanna • 0

I am presently working on RNA seq paired end data from few Drosophila tissues. As I dont have an established genome reference I need to rely on reference free variant calling tools and I find KISSPLICE suitable. I have outputs generated for my datasets. I have few questions related to the outputs generated.

The paired end inputs as indicated in manual needs to be loaded separately as -r read.1 and -r read.2. A single output that is generated contains coverage as C1 and C2 for lower and upper path, do I have to sort the outputs to get information for my sample or the output that is generated is final output that I can rely on for that sample? Find the log file I have attached with this mail for one sample. I carried the same analysis by merging 2 fastq files into one file. I got slightly different outputs.
Are there any graphical representations options that I can apply to the outputs? I have a total of 6 paired end samples . What is the best possible method to quantify and compare these samples? source of samples are testis tissue from different species.

thank you

RNA-Seq SNP kissplice de novo • 2.2k views

ADD COMMENT • link updated 7.1 years ago by vincent.lacroix ▴ 150 • written 7.2 years ago by koushikponnanna • 0

1

Entering edit mode

from few Drosophila tissues. As I dont have an established genome reference

Am I missing something here, because I thought Drosophila was pretty well characterised, and there was a decent reference - Here for example.

ADD REPLY • link 7.2 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

I am aware of the well characterized reference for few species. But the problem is those are quiet diverged, and using them to call variants would not serve my purpose. That is the reason I am looking to call variants from raw reads.

ADD REPLY • link 7.2 years ago by koushikponnanna • 0

score 0 · Answer 1 · 2017-10-20

Dear Koushik Ponnanna, Thanks for your interest in KisSplice. Below are some answers to your questions: 1- When you load the reads from a pair separately, C1 will correspond to the read count of read 1, and C2 will correspond to the read count of read 2. In principle, C1+C2 should give you the read count obtained with the merged files. I think the difference you see is due to the fact that you used the raw data for the merged files, and the trimmed data for the separated files. 2- If you have replicates, you may want to use KissDE to find variants associated to a species, as described here: http://kissplice.prabi.fr/TWAS/ If you do not have replicates, you cannot use KissDE. But you can still run all other steps of our pipeline. This will produce an .xls file which you can exploit for your downstream analysis. For instance, you can select SNPs which affect the protein sequence. The final table contains counts, and estimated allele frequencies. You can then plot those in R, as in Supp Figure 6 of our NAR paper. If you need a visual output for specific genes, you will probably need to assemble the full-length transcripts with Trinity (or Oases), map your reads to the assembled transcripts, and visualise your transcripts of interest through IGV. Best regards, Vincent