Hello, I have used VarScan.v2.3.2 to identify RNA-editing sites in my data. I first ran VarScan mpileup2snp using min. average quality set to 25, minimum coverage 8, minimum variant reads 2, min. var. freq. 0.1, strand filter 1 and pValue 0.01. After this I ran VarScan filter with min coverage 10, minimum variant reads 3, min. strands 1 and min. average quality 25. I pretty much ran VarScan filter in order to account for "strandness" issues, since I could have used stringent parameters in mpileup2snp.
I have compiled those A-to-G (second strand transcripts) and T-to-C (first strand variants) and am inspecting them individually in IGV. I am puzzled by the fact that not all of the A-to-G editing sites were called in second strand transcripts, nor are all of the T-to-C variants in first strand transcripts. Can somebody comment on what might be the source of this discrepancy? Will this be taken care if I set the min.strands filter to 2? I definitely don't have strand-specific reads (these are on their way) and have aligned my reads with BWA. I have these reads aligned with TopHat2, using a reference genes.gtf file, and am now preparing the inputs for VarScan. Will this solve the problem?
Thanks, G
I'm a little confused about what you are doing. So, to start with a simple question, are you aligning to the genome or to a fasta file of transcripts?
I aligning to the hg19.fa genome. using BWA. I had previously performed an alignment to hg19.fa with TopHat2, using a transcripts reference file, to aid in proper downstream transcript reconstruction by Cufflinks. In the current absence of a strand-specific data set, I am trying to find out if these "strand" discrepancies might be alleviated by using the TopHat2 alignments to call variants with the Samtools-VarScan scheme.