Entering edit mode
2.3 years ago
Nemo
•
0
Hi,
I have processed my rna-sequences reads using hisat2. I am following the steps of the gatk rna seq pipeline for varaint calling. My data is related to covid patients so I am using Wuhan-Hu-1.fasta file as the reference. In the BaseRecalibration step, there is required --known-sites flag. I do not know in my case what should I use here?
If you are doing variant calling for the virus go for LoFreq or Bcftools. Also if you are going with the default settings of Hisat2, you will not be able to detect
INDELS
.Ref: https://covid19.galaxyproject.org/genomics/no-more-business-as-usual/4-variation/#calling-variants-in-haploid-mixtures-is-not-standardized
Details about
--known-sites
https://gatk.broadinstitute.org/hc/en-us/articles/360036898312-BaseRecalibrator#--known-sitesThanks @Arup for your reply. What do you mean by default setting? what should I add more to include indels in my analysis? Also, I didnt get my answer for --known-sites? Should I use the gatk baserecalibrator? if yes what should be the value of the --known-sites flag?
Use
BWA MEM
ofBrowtie2
for the alignment and for BQSR previously known polymorphic sites are provided as--known-sites
. The option expects a BCF/VCF/BED file as input and multiple sets can be provided by using the option multiple times.As I read the BWA mem is designed for aligning dna sequences. Since I have rna data as I read in different resources I should use either STAR or Hisat2 as my aligner method.
BWA is not splice aware, so this will not be an issue for most bacteria and viruses.
Galaxy SARS-CoV-2 variant calling workflows.