Hello,
I wanted to add a read group on an alignment done with HISAT2. Using BWA, I could add it, for instance, with the lines:
RG="@RG\tID:XX.L001\tSM:1\tPL:ILLUMINA\tLB:lib501\tPU:XX.1.NoIndex"
bwa mem -R $RG <ref> <R1.fq.gz> <R2.fq.gz> -o <aln.sam>
What would be the equivalent for HISAT?
The manual says:
--rg-id <text> Set the read group ID to <text>. This causes the SAM @RG header line to be printed, with <text> as the value associated with the ID: tag. It also causes the RG:Z: extra field to be attached to each SAM output record, with value set to < text >.
--rg <text> Add <text> (usually of the form TAG:VAL, e.g. SM:Pool1) as a field on the @RG header line. Note: in order for the @RG line to appear, --rg-id must also be specified. This is because the ID tag is required by the SAM Spec. Specify --rg multiple times to set multiple fields. See the SAM Spec for details about what fields are legal.
So shall I use
hisat2 --rg-id $RG -q -x <ref> -1 <R1.fq.gz> -2 <R2.fq.gz> -S <aln.sam>
or shall I use --rg and then add individually the different fields as in:
hisat2 --rg ID:XX.L001 --rg SM:1 --rg PL:ILLUMINA --rg LB:lib501 --rg PU:XX.1.NoIndex -q -x <ref> -1 <R1.fq.gz> -2 <R2.fq.gz> -S <aln.sam>
Thank you
Instead of asking for
what I should use
why not try the settings you note above and see what happens. That is the best way to learn. People are always more inclined to assist when they see real world problems you are running into.fair point. anyway, I just ran the first option:
and checked the output with picard:
obtaining:
Thus the first method is not good. So I checked the second way:
Since there is no read group error (apart for the mismatched flag), I'd say the second way is the right one.
Case closed.
Hi, I wonder how we can get this information that you wrote "--rg ID:XX.L001 --rg SM:1 --rg PL:ILLUMINA --rg LB:lib501 --rg PU:XX.1.NoIndex " to specify them for --rg Id?
Read groups should be easy to figure out when you check this page from GATK. You may need to make up some values if there are none available.