NGS Data Analysis
1
1
Entering edit mode
6 months ago
Bindiya ▴ 10

Hi, I have recently started analysing whole genome sequencing data. I am very new to the field. I have completed trimmomatic run but do not know further steps so can you please help me with further steps. And also provide practical videos of pdfs link so that it become easy to run commands in terminal.

NGS Genome sequencing • 523 views
ADD COMMENT
3
Entering edit mode

so that it become easy to run commands in terminal.

Glad to hear that you are trying to analyze data on your own. Data analysis is part art but a lot of it can be "running commands" as you said. Important thing here is to understand why you are running those commands, what exactly they are doing and to become knowledgeable so you can make scientific inferences about the output you get. Just because a program ran does not automatically mean that it is going to produce valid results.

You said you have WGS data but what kind of experiment is this? What are you trying to accomplish?

ADD REPLY
0
Entering edit mode
6 months ago

That's great! Welcome to the field of whole genome sequencing analysis! Assuming that you have human WGS data and you would like to perform variant calling, here are the commands to follow:

1- Quality Control:

    fastp -i in.R1.fq.gz -I in.R2.fq.gz -o out.R1.fq.gz -O out.R2.fq.gz

2- Alignment to Reference Genome:

    bwa mem reference_genome.fa out.R1.fq.gz -O out.R2.fq.gz > aligned_reads.sam

3- Convert SAM to BAM, Sort, and Index:

samtools view -S -b aligned_reads.sam > aligned_reads.bam
samtools sort aligned_reads.bam -o sorted_reads.bam
samtools index sorted_reads.bam

4- Variant Calling:

    gatk HaplotypeCaller -R reference_genome.fa -I  sorted_reads.bam -O raw_variants.vcf

These are the main steps, but you could add some parameters or use other tools for better/faster results. Below are some helpful materials:

GATK Best Practices Workflows

After completing all these steps, you need to validate your pipeline using a reference sample and the GIAB dataset. Then, you can automate your pipeline with a simple bash script or use a more reproducible system like Snakemake.

ADD COMMENT

Login before adding your answer.

Traffic: 1818 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6