Question

Read group info

1

Entering edit mode

3.8 years ago

priya.bmg ▴ 70

Hello

I need help in getting read group info for performing alignment using BWA-MEM2. I read previous post (bwa mem: Passing a variable to read group) on read-group info, where a shell script is used to get the read group info from fastq file. Can someone explain what details should be given in the shell script, it would be of great help?

Thanks

Priya

BWA bwa-mem2 • 3.1k views

ADD COMMENT • link updated 3.8 years ago by Ishak ▴ 20 • written 3.8 years ago by priya.bmg ▴ 70

0

Entering edit mode

Making a vague reference to a previous post does not help you or us. Please provide a link for that post.

ADD REPLY • link 3.8 years ago by GenoMax 151k

0

Entering edit mode

Sorry, have given the link above.

ADD REPLY • link 3.8 years ago by priya.bmg ▴ 70

0

Entering edit mode

Since you are interested in running bwa-mem2 you will need to make the necessary changes inside the script to replace the command but otherwise you can use the answer bwa mem: Passing a variable to read group to run the script as shown. bwa-mapper.sh read_1.fq.gz read_2.fq.gz. Your read headers will need to follow the standard illumina format.

ADD REPLY • link 3.8 years ago by GenoMax 151k

0

Entering edit mode

Thread continues: Read group info

ADD REPLY • link 3.8 years ago by Kevin Blighe 89k

score 1 · Answer 1 · 2021-08-11

1

Entering edit mode

3.8 years ago

Ishak ▴ 20

A=( $(ls $1/*1.fastq.qz && ls  $1/*1.fq.qz) ) #collect all forward fastq files

for i in "${!A[@]}"; 
 do 
 header=$(zcat ${A[i]} | head -n 1)   
 id=$(echo $header | head -n 1 | cut -f 1 -d":" | sed 's/@//'
 echo "@RG\tID:$id"

I hope it helps

ADD COMMENT • link 3.8 years ago by Ishak ▴ 20

0

Entering edit mode

Hello

I have paired end sequences for 6 subjects. For each subject, read group information should be added in the bam file?. Read group info is different from subject to subject, right? If so, why combine all the forward fastq files as given in the above code. I am trying to understand the GATK pipeline for NGS analysis. Sorry for the silly question

ADD REPLY • link 3.8 years ago by priya.bmg ▴ 70

1

Entering edit mode

You will get a bam file for each two paired fastq files. The read group should be same in both fastq files. Usually you need to

ID: id of sample SM:${A[i]//_1.@(fq|fastq).gz} sample name PL:illumina as example PU: platform unit CN: co. name

The information is allocated at the header of fastq file as shown in the first comment

As ID, you can make code to extract other variables and add to RG beside ID like that "@RG\tID:$id\tCN:GENOKS"

ADD REPLY • link 3.8 years ago by Ishak ▴ 20