SNP analysis based on RNAseq data using GATK PIPELINE
3
0
Entering edit mode
7.6 years ago
Bioinfonext ▴ 470

Hi...

I have analyzed SNP in two contrasting genotypes based on RNAseq data using GATK pipeline. In some cases, there is a string of nucleotide in the reference position and the only single alternate nucleotide in genotype. what does it mean?

After calling SNP by using GATK, How should I filter raw VCF to find only confirmed SNP?

CHROM   POS ID                  REF                          ALT    QUAL    FILTER  INFO

R1        1119  .                C                            T     311.78  .             

R1        1132                CACTTGG                         C     302.75  .                 

R1       1275                   .T                            C     146.9   .           
.
SNP • 2.1k views
ADD COMMENT
1
Entering edit mode
7.6 years ago
mforde84 ★ 1.4k

That would be a deletion of ACTTGG, and if the columns were reversed it would be an insert.

ADD COMMENT
0
Entering edit mode
7.6 years ago
Bioinfonext ▴ 470

Thanks a lot. Can you please also suggest which quality score can be used as a cutoff to find confirm SNP. What are the other parameter to filter raw VCF?

ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode
7.6 years ago
Bioinfonext ▴ 470

Thanks a lot. please also suggest

Do I need to do Indel Realignment and Base Recalibration while calling SNP from RNAseq data?

If yes, then How do I get these two files for Indel Realignment: -known indels.vcf \ -targetIntervals intervalListFromRTC.interval

ADD COMMENT
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

Supplemental questions should not be entered as New Answers.

ADD REPLY
0
Entering edit mode

From some limited experience with variant calling on RNAseq data, I'm not entirely sure base score recalibration adds a whole lot to the sensitivity and specificity of the calls. Indel realignment just seems like a good idea, imo. Though the question I think is a bit moot anyway. If you have the computational resources, adding a couple hours to the analysis is a tradeoff a lot of people take just to ensure their analysis is thorough. Ideally, if you want to know how alterations to your pipeline alter call quality, you need to run permutations using a known data set (i.e., validated calls), and then decide on which gives you the most optimal results. Unfortunately there's no real apriori way to settle a lot of these issues. But hey, this is science, so why not test it out.

ADD REPLY

Login before adding your answer.

Traffic: 1753 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6