Question

How to remove reads mapped to both plasmid and chromosome from the .sam file?

0

Entering edit mode

3.3 years ago

Vasiliy Krestov ▴ 30

I have a .sam file with reads mapped to a plasmid and a bacterial chromosome. I used bowtie to map reads and I allowed multi-mapping.

How can I exclude reads which were aligned to both plasmid and chromosome from the .sam file? I want to do this in order to avoid biases caused by different plasmid copy numbers

SAM-file NGS small-RNA • 1.2k views

ADD COMMENT • link 2.2 years ago by Vasiliy Krestov ▴ 30

1

Entering edit mode

This would likely need a custom program. You will need to name sort your BAM file and then walk through it to find reads that aligned to both chromosome and plasmid (I assume it was a separate entry in the reference) and drop those reads/lines.

A more crude way would be to isolate columns 1 and 3 and the sort|uniq them. Then keep read entries that occur only once.

ADD REPLY • link 3.3 years ago by GenoMax 147k

0

Entering edit mode

For bowtie it is pretty simple as you can tune alignment parameters to not allow multimappers in the SAM file, see http://bowtie-bio.sourceforge.net/manual.shtml#bowtie-options-m

Alternatively, multimappers have low MAPQ scores, so filtering based on MAPQ (say 10 and above) should remove multimappers as well, e.g. samtools view -q 10 -o out.sam in.sam

ADD REPLY • link 3.3 years ago by ATpoint 85k

0

Entering edit mode

I think I can't use it because at the same time I want to retain those multi-mappers which occur only in chromosome/plasmid. I don't want to exclude all multi-mappers :)

ADD REPLY • link 3.3 years ago by Vasiliy Krestov ▴ 30

score 1 · Accepted Answer · 2022-09-04

Late update:)

This small skript did what I wanted!

#!/usr/bin/python3
import pysam
def remove_reads_mapped_to_two_seq(in_file, out_file):
    sam=pysam.AlignmentFile(in_file, "r")
    file_to_write=pysam.AlignmentFile(out_file, "wb", template=sam)
    current_name='current'
    reads_ws_name=[]
    count=0
    max_mapped_to=0
    countX=0
    for read in sam:
        count+=1
        if read.query_name!=current_name:
            ref_names=[]
            for same in reads_ws_name:
                ref_names.append(same.reference_name)
            if len(set(ref_names))==1:
                mapps=0
                for same in reads_ws_name:
                    file_to_write.write(same)
                    mapps+=1
                    countX+=1
                if mapps>max_mapped_to:
                    max_mapped_to=mapps
            reads_ws_name=[]
            current_name=read.query_name
        reads_ws_name.append(read)
remove_reads_mapped_to_two_seq('../temporary_files/multimappers_namesorted.sam', '../temporary_files/multimappers_namesorted_one_refseq.sam')