Question

can anyone explain read id sorted in the samblaster page

1

Entering edit mode

9.7 years ago

Ming Tommy Tang ★ 4.5k

Hi,

I am using samblaster to mark duplicates, and it requires the reads to be read id sorted.

Can anyone explain it? I have read sam specification from here:

Dad's data:

@RG     ID:FLOWCELL1.LANE1      PL:ILLUMINA     LB:LIB-DAD-1 SM:DAD      PI:200
@RG     ID:FLOWCELL1.LANE2      PL:ILLUMINA     LB:LIB-DAD-1 SM:DAD      PI:200
@RG     ID:FLOWCELL1.LANE3      PL:ILLUMINA     LB:LIB-DAD-2 SM:DAD      PI:400
@RG     ID:FLOWCELL1.LANE4      PL:ILLUMINA     LB:LIB-DAD-2 SM:DAD      PI:400

Mom's data:

@RG     ID:FLOWCELL1.LANE5      PL:ILLUMINA     LB:LIB-MOM-1 SM:MOM      PI:200
@RG     ID:FLOWCELL1.LANE6      PL:ILLUMINA     LB:LIB-MOM-1 SM:MOM      PI:200
@RG     ID:FLOWCELL1.LANE7      PL:ILLUMINA     LB:LIB-MOM-2 SM:MOM      PI:400
@RG     ID:FLOWCELL1.LANE8      PL:ILLUMINA     LB:LIB-MOM-2 SM:MOM      PI:400

Kid's data:

@RG     ID:FLOWCELL2.LANE1      PL:ILLUMINA     LB:LIB-KID-1 SM:KID      PI:200
@RG     ID:FLOWCELL2.LANE2      PL:ILLUMINA     LB:LIB-KID-1 SM:KID      PI:200
@RG     ID:FLOWCELL2.LANE3      PL:ILLUMINA     LB:LIB-KID-2 SM:KID      PI:400
@RG     ID:FLOWCELL2.LANE4      PL:ILLUMINA     LB:LIB-KID-2 SM:KID      PI:400

The @RG ID is to identify reads from a specific lane, SM is for the sample name. So, what is the read id? I am a bit confused. My bam file only contains one SM and one ID.

Thank you!

Ming

bam sequencing • 3.7k views

ADD COMMENT • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by Ming Tommy Tang ★ 4.5k

1

Entering edit mode

Posting as a comment because I'm not entirely sure it's correct...

From the context in the samblaster documentation, I suspect that "read-id" is what would normally be called "query name" or "read name" in the spec. In other words, use samtools sort -n. That would also make sense given that it explicitly mentions that "read-id" sorting is what aligners produce.

ADD REPLY • link 9.7 years ago by Devon Ryan 105k

0

Entering edit mode

Thanks for your reply. my bam files were sorted by coordinates. I might have to sort them by name. I know HTSeq requires bam files to be sorted by name (-n), I am not sure whether the same requirement is for samblaster.

ADD REPLY • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by Ming Tommy Tang ★ 4.5k