samtools merge respecting original read groups
1
0
Entering edit mode
8.6 years ago
abascalfederico ★ 1.2k

Hi,

I'm struggling to merge many bam files respecting the original read groups with samtools. Some of the input files have the same read group, other files have others. If I don't use "-c" each "duplicate" RGs are assigned a random suffix (which I don't want to, they are the same RG). If I use "-c", RGs are not altered, but then only one RG line is printed in the header.

Does any one have a clue on how to solve this?

Ideally, with "-c" samtools should print any RG header line that is different from the others in the merged bam.

Many thanks Federico

samtools read groups • 3.7k views
ADD COMMENT
0
Entering edit mode

Which version of samtools are you using? I recall that there were modifications to handle this relatively recently.

ADD REPLY
0
Entering edit mode

Thanks for the feedback, Devon. I am using Version: 1.2 (using htslib 1.2.1). I have looked at the latest version (1.3) but don't see anything about this.

I've read in other post that for this kind of situation picard-tools is the right way to go. Wish samtools change this.

ADD REPLY
1
Entering edit mode
8.6 years ago
abascalfederico ★ 1.2k

In case it is useful for someone, I've written a perl routine to solve this problem.

sub create_header {
my $p_input_bams = $_[0];
my $header_file  = $_[1];
my $RG_string = "";
my %rgs;
foreach my $b ( @{$p_input_bams} ) {
    open(J,"samtools view -H $b|") || die "cñakjdñakj $b\n";
    while(<J>) {
        #print STDERR "$_";
        last unless(/^\@/);
        if(/^\@RG/) {
            $_ =~ s/\#[0-9]+//;
            $rgs{$_} = 1;
        }
    }
    close(J);
}
foreach my $rg ( keys %rgs ) {
    $RG_string .= $rg;
}
print STDERR "RG_string=\n$RG_string\n";
my $main_bam = $p_input_bams->[0];
open(IB, "samtools view -h $main_bam |") || die "Cannot open main: $main_bam\n";
open(OB, ">$header_file") || die "cannot write to header: $header_file\n";
while(<IB>) {
    last unless(/^\@/);
    if(!/^\@RG/) {
        print OB;
    } else {
        print STDERR "Printing RG string...\n";
        print OB $RG_string;
    }

}
close(OB);
close(IB);

}

To use it:

system("samtools merge -cf $lib.merge-changeHeader.bam @input_bams");
&create_header(\@input_bams,$header_file);
system("samtools reheader $header_file $lib.merge-changeHeader.bam > $lib.merge.bam");
ADD COMMENT
0
Entering edit mode

I assume this part is lost in unicode translation cñakjdñakj :-)

ADD REPLY
1
Entering edit mode

hehehe my "die" error messages are usually random typings. It's fast and helps a lot debugging errors

ADD REPLY

Login before adding your answer.

Traffic: 3321 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6