Entering edit mode
5.1 years ago
CuriusScientist
▴
50
I am new to the genome analysis and want to analyse ultra low-pass WGS data for copy number variation analysis.
It will be really helpful if the members of the community can check if the approach I have taken is correct
# fetch information for read groups
header=$(zcat ${sample_name}_R1_001.fastq.gz | head -n 1)
instrument_Id=$(echo $sn | cut -f 1 -d":")
run_number=$(echo $sn | cut -f 2 -d":")
flowcell_Id=$(echo $sn | cut -f 3 -d":")
lane_number=$(echo $sn | cut -f 4 -d":")
tile_number=$(echo $sn | cut -f 5 -d":")
read_number=$(echo $sn | cut -f 2 -d"/")
ID="$flowcell_Id"_"$lane_number"_"${sample_name}"
PL="ILLUMINA"
LB="${sample_name}"_"$lane_number"
echo "Cores = $(( $(lscpu | awk '/^Socket\(s\)/{ print $2 }') * $(lscpu | awk '/^Core\(s\) per socket/{ print $4 }') ))"
echo $Cores
# Mapping using BWA-mem
bwa mem -t $Cores -M -R '@RG\tID:'$ID'\tSM:'${sample_name}'\tPL:ILLUMINA\tLB:'$LB Homo_sapiens.GRCh38.dna.primary_assembly.fa ${sample_name}_R1_001.fastq.gz ${sample_name}_R2_001.fastq.gz > ${sample_name}.sam
# Because BWA can sometimes leave unusual FLAG information on SAM records, it is helpful when working with many tools to first clean up read pairing information and flags
samtools fixmate ${sample_name}.sam ${sample_name}.fixmate.bam
# Keep only uniquely mapped read
# -F 780: include reads with none of the following tags , i.e. sam flag 780, reads unmapped, mate unmapped, not a primary alignemnet read fails platform/vendor quality checks
samtools view -b -q 0 -f 2 -F 780 -o ${sample_name}.fixmate.unique.bam ${sample_name}.fixmate.bam
# Keep only uniquely mapped reads and fliter the ones with quality score < 10 (-q 10)
samtools view -b -q 10 -f 2 -F 780 -o ${sample_name}.fixmate.unique.filtered.bam ${sample_name}.fixmate.bam
# sort mapped reads
samtool sort ${sample_name}.fixmate.bam -o ${sample_name}.fixmate.sorted.bam
samtool sort ${sample_name}.fixmate.unique.bam -o ${sample_name}.fixmate.sorted.unique.bam
samtool sort ${sample_name}.fixmate.unique.filtered.bam -o ${sample_name}.fixmate.sorted.unique.filtered.bam
# index mapped reads
samtools index ${sample_name}.fixmate.sorted.bam
samtools index ${sample_name}.fixmate.sorted.unique.bam
samtools index ${sample_name}.fixmate.sorted.unique.filtered.bam
# Remove duplicates
picard MarkDuplicates METRICS_FILE=${sample_name}.fixmate.sorted.unique.filtered.dedup.metrics.txt REMOVE_DUPLICATES=true ASSUME_SORTED=true VALIDATION_STRINGENCY=STRICT INPUT=${sample_name}.fixmate.sorted.unique.filtered.bam OUTPUT= ${sample_name}.fixmate.sorted.unique.filtered.dedup.bam
picard SortSam.jar I=${sample_name}.fixmate.sorted.unique.filtered.dedup.bam O=${sample_name}.fixmate.sorted.unique.filtered.dedup.sorted.bam SO=coordinate
picard BuildBamIndex INPUT=${sample_name}.fixmate.sorted.unique.filtered.dedup.sorted.bam
What I want to know is this
- have I missed any step
- have I done some unnecessary
- is the order of the steps and parameters correct?
I also want to clarify that after this, I am planning to use ichorCNA
- GC-content bias correction (using HMMcopy) a. Computing read coverage from ULP-WGS b. Data correction and normalization
- CNA prediction and estimation of tumor fraction of cfDNA