Low-coverage whole-genome sequencing (LC-WGS) or ultra low-pass WGS analysis
0
0
Entering edit mode
5.1 years ago

I am new to the genome analysis and want to analyse ultra low-pass WGS data for copy number variation analysis.

It will be really helpful if the members of the community can check if the approach I have taken is correct

# fetch information for read groups
header=$(zcat ${sample_name}_R1_001.fastq.gz | head -n 1)
instrument_Id=$(echo $sn | cut -f 1 -d":")
run_number=$(echo $sn | cut -f 2 -d":")
flowcell_Id=$(echo $sn | cut -f 3 -d":")
lane_number=$(echo $sn | cut -f 4 -d":")
tile_number=$(echo $sn | cut -f 5 -d":")
read_number=$(echo $sn | cut -f 2 -d"/")
ID="$flowcell_Id"_"$lane_number"_"${sample_name}"
PL="ILLUMINA"
LB="${sample_name}"_"$lane_number"
echo "Cores = $(( $(lscpu | awk '/^Socket\(s\)/{ print $2 }') * $(lscpu | awk '/^Core\(s\) per socket/{ print $4 }') ))"
echo $Cores

# Mapping using BWA-mem
bwa mem -t $Cores -M -R '@RG\tID:'$ID'\tSM:'${sample_name}'\tPL:ILLUMINA\tLB:'$LB Homo_sapiens.GRCh38.dna.primary_assembly.fa ${sample_name}_R1_001.fastq.gz ${sample_name}_R2_001.fastq.gz >  ${sample_name}.sam

# Because BWA can sometimes leave unusual FLAG information on SAM records, it is helpful when working with many tools to first clean up read pairing information and flags
samtools fixmate ${sample_name}.sam ${sample_name}.fixmate.bam

# Keep only uniquely mapped read
# -F 780: include reads with none of the following tags , i.e. sam flag 780, reads unmapped, mate unmapped, not a primary alignemnet read fails platform/vendor quality checks
samtools view -b -q 0 -f 2 -F 780 -o ${sample_name}.fixmate.unique.bam ${sample_name}.fixmate.bam
# Keep only uniquely mapped reads and fliter the ones with quality score < 10 (-q 10)
samtools view -b -q 10 -f 2 -F 780 -o ${sample_name}.fixmate.unique.filtered.bam ${sample_name}.fixmate.bam

# sort mapped reads
samtool sort ${sample_name}.fixmate.bam -o ${sample_name}.fixmate.sorted.bam
samtool sort ${sample_name}.fixmate.unique.bam -o ${sample_name}.fixmate.sorted.unique.bam
samtool sort ${sample_name}.fixmate.unique.filtered.bam -o ${sample_name}.fixmate.sorted.unique.filtered.bam

# index mapped reads
samtools index ${sample_name}.fixmate.sorted.bam
samtools index ${sample_name}.fixmate.sorted.unique.bam
samtools index ${sample_name}.fixmate.sorted.unique.filtered.bam

# Remove duplicates
picard MarkDuplicates METRICS_FILE=${sample_name}.fixmate.sorted.unique.filtered.dedup.metrics.txt REMOVE_DUPLICATES=true ASSUME_SORTED=true VALIDATION_STRINGENCY=STRICT INPUT=${sample_name}.fixmate.sorted.unique.filtered.bam OUTPUT= ${sample_name}.fixmate.sorted.unique.filtered.dedup.bam

picard SortSam.jar I=${sample_name}.fixmate.sorted.unique.filtered.dedup.bam O=${sample_name}.fixmate.sorted.unique.filtered.dedup.sorted.bam SO=coordinate

picard BuildBamIndex INPUT=${sample_name}.fixmate.sorted.unique.filtered.dedup.sorted.bam

What I want to know is this

  • have I missed any step
  • have I done some unnecessary
  • is the order of the steps and parameters correct?

I also want to clarify that after this, I am planning to use ichorCNA

  • GC-content bias correction (using HMMcopy) a. Computing read coverage from ULP-WGS b. Data correction and normalization
  • CNA prediction and estimation of tumor fraction of cfDNA
next-gen sequencing genome LC-WGS ichorCNA • 1.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 2381 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6