Entering edit mode
13 hours ago
ashaneev07
▴
30
Hello all,
I have been working with a sample dataset and have developed a workflow that generates a VCF file through GATK. This is typically used for identifying genetic variants in human samples. It would mean a lot to me if anyone could kindly review my commands, as your feedback would help boost my confidence in my approach. Only then can I apply this to the original sample.
Thank you..
bwa mem -t 4 -R "@RG\tID:sample003\tPL:ILLUMINA\tSM:sample003" /hg38/hg38.fa sample_R1.fastp.fastq.gz sample_R2.fastp.fastq.gz > sample003.paired.sam
OR
bwa mem -t 4 -R "@RG\tID:sample003\tPL:ILLUMINA\tSM:sample003" /hg38/hg38.fa sample_R1.fastp.fastq.gz sample_R2.fastp.fastq.gz | samtools view -Sb - > sample003.paired.bam
samtools view -Sb sample003.paired.sam > sample003.paired.bam
samtools sort sample003.paired.bam -o sample003.paired_sorted.bam
gatk MarkDuplicatesSpark -I sample003.paired_sorted.bam -O sample003.sorted_dedup_reads.bam
gatk BaseRecalibrator -I sample003.sorted_dedup_reads.bam -R /hg38/hg38.fa --known-sites /hg38/Homo_sapiens_assembly38.dbsnp138.vcf -O recal_data.table
gatk ApplyBQSR -I sample003.sorted_dedup_reads.bam -R /hg38/hg38.fa --bqsr-recal-file recal_data.table -O sample003_sorted_dedup_bqsr_reads.bam
gatk CollectAlignmentSummaryMetrics R=/hg38/hg38.fa I=sample003_sorted_dedup_bqsr_reads.bam O=alignment_metrics.txt
gatk HaplotypeCaller -R /hg38/hg38.fa -I sample003_sorted_dedup_bqsr_reads.bam -O 53_raw_variants.vcf --dbsnp /dbsnp/hg38_All_20180418.vcf
gatk SelectVariants -R /hg38/hg38.fa -V 53_raw_variants.vcf -O snps_only.vcf --select-type-to-include SNP
gatk VariantFiltration \
-R /hg38/hg38.fa \
-V snps_only.vcf \
-O snp.filtered.unnorm.vcf \
--filter-expression "MQ0 >= 5 && ((MQ0 / (1.0 * DP)) > 0.1)" --filter-name "HARD_TO_VALIDATE" \
--filter-expression "DP < 10" --filter-name "LowCoverage" \
--filter-expression "QUAL < 30.0" --filter-name "VeryLowQual" \
--filter-expression "QUAL > 30.0 && QUAL < 50.0" --filter-name "LowQual" \
--filter-expression "QD < 1.5" --filter-name "LowQD"
bcftools norm -c w -f /hg38/hg38.fa -m -any -o variants.filtered.norm.vcf.gz -Oz snp.filtered.unnorm.vcf
2
Entering edit mode
- gatk-funcotator data source
- Cutadapt&FastQC:sequence length disturbution failed after adapter trimming
- Server or aws cloud- which one is better for gatk pipeline
- Fetch table from clinvar database according to a list of rsid
- Minimum requirements for a laptop to do rna seq analysis
- Exonerate installation on server
- Error while running gatk
- Augustus and SNAP training in Maker-p pipeline
- Retrieving sequence from NCBI
- Snoscan#error#make: Nothing to be done for 'all'
- Maker Annotation pipeline Output error
- E: Package 'libbamtools-dev' has no installation candidate
- Increase memory limit in SPAdes-3.6.1
- SNP filtering from specific regions
ADD REPLY
• link
11 hours ago by
Pierre Lindenbaum
164k