Question

add read group with hisat2

0

Entering edit mode

6.4 years ago

marongiu.luigi ▴ 730

Hello,

I wanted to add a read group on an alignment done with HISAT2. Using BWA, I could add it, for instance, with the lines:

RG="@RG\tID:XX.L001\tSM:1\tPL:ILLUMINA\tLB:lib501\tPU:XX.1.NoIndex"
bwa mem -R $RG <ref> <R1.fq.gz> <R2.fq.gz> -o <aln.sam>

What would be the equivalent for HISAT?

The manual says:

--rg-id <text> Set the read group ID to <text>. This causes the SAM @RG header line to be printed, with <text> as the value associated with the ID: tag. It also causes the RG:Z: extra field to be attached to each SAM output record, with value set to < text >.

--rg <text> Add <text> (usually of the form TAG:VAL, e.g. SM:Pool1) as a field on the @RG header line. Note: in order for the @RG line to appear, --rg-id must also be specified. This is because the ID tag is required by the SAM Spec. Specify --rg multiple times to set multiple fields. See the SAM Spec for details about what fields are legal.

So shall I use

hisat2 --rg-id $RG -q -x <ref> -1 <R1.fq.gz> -2 <R2.fq.gz> -S <aln.sam>

or shall I use --rg and then add individually the different fields as in:

hisat2 --rg ID:XX.L001 --rg SM:1 --rg PL:ILLUMINA --rg LB:lib501 --rg PU:XX.1.NoIndex -q -x <ref> -1 <R1.fq.gz> -2 <R2.fq.gz> -S <aln.sam>

Thank you

read group algnment HISAT2 • 3.7k views

ADD COMMENT • link updated 2.6 years ago by GenoMax 148k • written 6.4 years ago by marongiu.luigi ▴ 730

0

Entering edit mode

Instead of asking for what I should use why not try the settings you note above and see what happens. That is the best way to learn. People are always more inclined to assist when they see real world problems you are running into.

ADD REPLY • link 6.4 years ago by GenoMax 148k

0

Entering edit mode

fair point. anyway, I just ran the first option:

hisat2 --rg-id "@RG\tID:C3MF6ACXX.L001\tSM:501\tPL:ILLUMINA\tLB:lib-501\
\tPU:C3MF6ACXX.1.NoIndex" -p 10 -q -x ./ref/GRCh38-21.fa -1 501N-1_1.fq.gz -2 501N-1_2.fq.gz -S 501N_hst.sam

and checked the output with picard:

java -jar ~/src/picard/picard.jar ValidateSamFile INPUT=501N_hst.sam MODE=SUMMARY

obtaining:

## HISTOGRAM    java.lang.String
Error Type  Count
ERROR:HEADER_RECORD_MISSING_REQUIRED_TAG    1
ERROR:MISMATCH_FLAG_MATE_NEG_STRAND 2566136
ERROR:MISSING_PLATFORM_VALUE    1

Thus the first method is not good. So I checked the second way:

$ hisat2 --rg ID:C3MF6ACXX.L001 --rg SM:501 --rg PL:ILLUMINA --rg LB:lib-501 --rg PU:C3MF6ACXX.1.NoIndex -p 10 -q -x ./ref/GRCh38.fa -1 501N-1_1.fq.gz -2 501N-1_2.fq.gz -S 501N_hst2.sam
$ java -jar ~/src/picard/picard.jar ValidateSamFile INPUT=501N_hst2.sam MODE=SUMMARY
[...]
## HISTOGRAM    java.lang.String
Error Type  Count
ERROR:MISMATCH_FLAG_MATE_NEG_STRAND 2566136

Since there is no read group error (apart for the mismatched flag), I'd say the second way is the right one.

Case closed.

ADD REPLY • link 6.4 years ago by marongiu.luigi ▴ 730

0

Entering edit mode

Hi, I wonder how we can get this information that you wrote "--rg ID:XX.L001 --rg SM:1 --rg PL:ILLUMINA --rg LB:lib501 --rg PU:XX.1.NoIndex " to specify them for --rg Id?

ADD REPLY • link 2.6 years ago by Sara • 0

0

Entering edit mode

Read groups should be easy to figure out when you check this page from GATK. You may need to make up some values if there are none available.

ADD REPLY • link 2.6 years ago by GenoMax 148k