Question

NGS Data Analysis

1

Entering edit mode

6 months ago

Bindiya ▴ 10

Hi, I have recently started analysing whole genome sequencing data. I am very new to the field. I have completed trimmomatic run but do not know further steps so can you please help me with further steps. And also provide practical videos of pdfs link so that it become easy to run commands in terminal.

NGS Genome sequencing • 523 views

ADD COMMENT • link updated 6 months ago by a.alnawfal.1992 ▴ 360 • written 6 months ago by Bindiya ▴ 10

3

Entering edit mode

so that it become easy to run commands in terminal.

Glad to hear that you are trying to analyze data on your own. Data analysis is part art but a lot of it can be "running commands" as you said. Important thing here is to understand why you are running those commands, what exactly they are doing and to become knowledgeable so you can make scientific inferences about the output you get. Just because a program ran does not automatically mean that it is going to produce valid results.

You said you have WGS data but what kind of experiment is this? What are you trying to accomplish?

ADD REPLY • link 6 months ago by GenoMax 148k

score 0 · Answer 1 · 2024-06-13

That's great! Welcome to the field of whole genome sequencing analysis! Assuming that you have human WGS data and you would like to perform variant calling, here are the commands to follow:

1- Quality Control:

    fastp -i in.R1.fq.gz -I in.R2.fq.gz -o out.R1.fq.gz -O out.R2.fq.gz

2- Alignment to Reference Genome:

    bwa mem reference_genome.fa out.R1.fq.gz -O out.R2.fq.gz > aligned_reads.sam

3- Convert SAM to BAM, Sort, and Index:

samtools view -S -b aligned_reads.sam > aligned_reads.bam
samtools sort aligned_reads.bam -o sorted_reads.bam
samtools index sorted_reads.bam

4- Variant Calling:

    gatk HaplotypeCaller -R reference_genome.fa -I  sorted_reads.bam -O raw_variants.vcf

These are the main steps, but you could add some parameters or use other tools for better/faster results. Below are some helpful materials:

GATK Best Practices Workflows

After completing all these steps, you need to validate your pipeline using a reference sample and the GIAB dataset. Then, you can automate your pipeline with a simple bash script or use a more reproducible system like Snakemake.