I'm pretty novice when it comes to bioinformatics. I am currently working on a project where I am trying to call SNPs from frog species which have not been completely sequenced. I have RNA from 10 different tissues among 5 closely related frog species.
I've tried aligning these sequences to a reference genome of a more distantly related frog that has been completely sequenced but the results are very poor (2-3% of the reads are mapped to the reference genome).
I thought perhaps I may be able to do a de novo assembly of the RNA sequences of each tissue and use the resulting contigs as a "reference genome" for the alignment.
Does anyone have any experience with this? Is this a reasonable way to call SNPs from data without a reference genome?
Thank you for the fantastic tool !
I am working on it currently and have a small doubt with the genotypes output file.
Please see it below:
I am trying to compare 2 samples and ended up with a lot of combinations of genotype coverage values i.e., (2 2) / (2 1) / (2 0)/ (2 -1) / (1 -1) etc.. I understand that I can choose (1 -1) combination. Please let me know what other combinations can I choose?
Many Thanks,
Siva
Hi there,
Thanks for this question Siva.
A value is associated to each SNP and to each sample (explaining why you have 2 values per SNP here while usng two samples).
You've indicated a coverage threshold T to the "genotyper".
Given one sample, for each allele of the SNP you have a coverage. Thus you end with two coverage values per SNP.
Choosing (1 -1) or (-1 1) is a way to conserve SNPs that are homozygous and distinct in the two samples. Choosing (2 2) is a way to conserve SNPs that are heterozygous in both samples,
...
Best, Pierre