Question

Low-coverage whole-genome sequencing (LC-WGS) or ultra low-pass WGS analysis

0

Entering edit mode

5.1 years ago

CuriusScientist ▴ 50

I am new to the genome analysis and want to analyse ultra low-pass WGS data for copy number variation analysis.

It will be really helpful if the members of the community can check if the approach I have taken is correct

# fetch information for read groups
header=$(zcat ${sample_name}_R1_001.fastq.gz | head -n 1)
instrument_Id=$(echo $sn | cut -f 1 -d":")
run_number=$(echo $sn | cut -f 2 -d":")
flowcell_Id=$(echo $sn | cut -f 3 -d":")
lane_number=$(echo $sn | cut -f 4 -d":")
tile_number=$(echo $sn | cut -f 5 -d":")
read_number=$(echo $sn | cut -f 2 -d"/")
ID="$flowcell_Id"_"$lane_number"_"${sample_name}"
PL="ILLUMINA"
LB="${sample_name}"_"$lane_number"
echo "Cores = $(( $(lscpu | awk '/^Socket\(s\)/{ print $2 }') * $(lscpu | awk '/^Core\(s\) per socket/{ print $4 }') ))"
echo $Cores

# Mapping using BWA-mem
bwa mem -t $Cores -M -R '@RG\tID:'$ID'\tSM:'${sample_name}'\tPL:ILLUMINA\tLB:'$LB Homo_sapiens.GRCh38.dna.primary_assembly.fa ${sample_name}_R1_001.fastq.gz ${sample_name}_R2_001.fastq.gz >  ${sample_name}.sam

# Because BWA can sometimes leave unusual FLAG information on SAM records, it is helpful when working with many tools to first clean up read pairing information and flags
samtools fixmate ${sample_name}.sam ${sample_name}.fixmate.bam

# Keep only uniquely mapped read
# -F 780: include reads with none of the following tags , i.e. sam flag 780, reads unmapped, mate unmapped, not a primary alignemnet read fails platform/vendor quality checks
samtools view -b -q 0 -f 2 -F 780 -o ${sample_name}.fixmate.unique.bam ${sample_name}.fixmate.bam
# Keep only uniquely mapped reads and fliter the ones with quality score < 10 (-q 10)
samtools view -b -q 10 -f 2 -F 780 -o ${sample_name}.fixmate.unique.filtered.bam ${sample_name}.fixmate.bam

# sort mapped reads
samtool sort ${sample_name}.fixmate.bam -o ${sample_name}.fixmate.sorted.bam
samtool sort ${sample_name}.fixmate.unique.bam -o ${sample_name}.fixmate.sorted.unique.bam
samtool sort ${sample_name}.fixmate.unique.filtered.bam -o ${sample_name}.fixmate.sorted.unique.filtered.bam

# index mapped reads
samtools index ${sample_name}.fixmate.sorted.bam
samtools index ${sample_name}.fixmate.sorted.unique.bam
samtools index ${sample_name}.fixmate.sorted.unique.filtered.bam

# Remove duplicates
picard MarkDuplicates METRICS_FILE=${sample_name}.fixmate.sorted.unique.filtered.dedup.metrics.txt REMOVE_DUPLICATES=true ASSUME_SORTED=true VALIDATION_STRINGENCY=STRICT INPUT=${sample_name}.fixmate.sorted.unique.filtered.bam OUTPUT= ${sample_name}.fixmate.sorted.unique.filtered.dedup.bam

picard SortSam.jar I=${sample_name}.fixmate.sorted.unique.filtered.dedup.bam O=${sample_name}.fixmate.sorted.unique.filtered.dedup.sorted.bam SO=coordinate

picard BuildBamIndex INPUT=${sample_name}.fixmate.sorted.unique.filtered.dedup.sorted.bam

What I want to know is this

have I missed any step
have I done some unnecessary
is the order of the steps and parameters correct?

I also want to clarify that after this, I am planning to use ichorCNA

GC-content bias correction (using HMMcopy) a. Computing read coverage from ULP-WGS b. Data correction and normalization
CNA prediction and estimation of tumor fraction of cfDNA

next-gen sequencing genome LC-WGS ichorCNA • 1.3k views

ADD COMMENT • link updated 5.1 years ago by Biostar 20 • written 5.1 years ago by CuriusScientist ▴ 50