How to remove reads mapped to both plasmid and chromosome from the .sam file?
1
0
Entering edit mode
3.4 years ago

I have a .sam file with reads mapped to a plasmid and a bacterial chromosome. I used bowtie to map reads and I allowed multi-mapping.

How can I exclude reads which were aligned to both plasmid and chromosome from the .sam file? I want to do this in order to avoid biases caused by different plasmid copy numbers

SAM-file NGS small-RNA • 1.2k views
ADD COMMENT
1
Entering edit mode

This would likely need a custom program. You will need to name sort your BAM file and then walk through it to find reads that aligned to both chromosome and plasmid (I assume it was a separate entry in the reference) and drop those reads/lines.

A more crude way would be to isolate columns 1 and 3 and the sort|uniq them. Then keep read entries that occur only once.

ADD REPLY
0
Entering edit mode

For bowtie it is pretty simple as you can tune alignment parameters to not allow multimappers in the SAM file, see http://bowtie-bio.sourceforge.net/manual.shtml#bowtie-options-m

Alternatively, multimappers have low MAPQ scores, so filtering based on MAPQ (say 10 and above) should remove multimappers as well, e.g. samtools view -q 10 -o out.sam in.sam

ADD REPLY
0
Entering edit mode

I think I can't use it because at the same time I want to retain those multi-mappers which occur only in chromosome/plasmid. I don't want to exclude all multi-mappers :)

ADD REPLY
1
Entering edit mode
2.3 years ago

Late update:)

This small skript did what I wanted!

#!/usr/bin/python3
import pysam
def remove_reads_mapped_to_two_seq(in_file, out_file):
    sam=pysam.AlignmentFile(in_file, "r")
    file_to_write=pysam.AlignmentFile(out_file, "wb", template=sam)
    current_name='current'
    reads_ws_name=[]
    count=0
    max_mapped_to=0
    countX=0
    for read in sam:
        count+=1
        if read.query_name!=current_name:
            ref_names=[]
            for same in reads_ws_name:
                ref_names.append(same.reference_name)
            if len(set(ref_names))==1:
                mapps=0
                for same in reads_ws_name:
                    file_to_write.write(same)
                    mapps+=1
                    countX+=1
                if mapps>max_mapped_to:
                    max_mapped_to=mapps
            reads_ws_name=[]
            current_name=read.query_name
        reads_ws_name.append(read)
remove_reads_mapped_to_two_seq('../temporary_files/multimappers_namesorted.sam', '../temporary_files/multimappers_namesorted_one_refseq.sam')
ADD COMMENT

Login before adding your answer.

Traffic: 2066 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6