Question

How to find all mapped locations of a read from PySam?

0

Entering edit mode

4.7 years ago

O.rka ▴ 740

I have a mapping with an insane amount of multimapped reads. I'm trying to figure out all the locations these multimapped reads have mapped to but I'm having difficult accessing all of the locations from the bam file. I got my BAM file from STAR aligner.

Here's my STAR command:

conda activate star_env

STAR --genomeDir star_index_genome_ercc92_virus --readFilesCommand zcat --readFilesIn preprocessing/kneaddata_repaired_1.fastq.gz preprocessing/kneaddata_repaired_2.fastq.gz --outFileNamePrefix intermediate/star_output/star_ --runThreadN 2 --outReadsUnmapped Fastx

samtools view -h -b intermediate/star_output/star_Aligned.out.sam > intermediate/star_output/star_Aligned.out.bam

I'm iterating through my bam file but I can't find where all of the multimapping locations are?

for fp in sy.pv(sizes.index, "BAM Files"):
    bamfile = pysam.AlignmentFile(fp, "rb")
    for record in bamfile:
        number_of_loci = record.get_tag("NH")
        if number_of_loci > 1:
            break

Here's the results:

record.get_tags()
#[('NH', 6), ('HI', 1), ('AS', 299), ('nM', 0)]

record.reference_name, record.reference_start, record.reference_end
#('000181F_arrow', 3618, 3769)

Why isn't record.reference_start and record.reference_end a tuple of 6 positions?

alignment • 1.9k views

ADD COMMENT • link updated 4.7 years ago by GouthamAtla 12k • written 4.7 years ago by O.rka ▴ 740

0

Entering edit mode

why not using a simple samtools, awk ? Like samtools view in.sorted.bam | awk '{ if ($5<255) print }' | cut -f3,4

ADD REPLY • link 4.7 years ago by GouthamAtla 12k

0

Entering edit mode

I'm trying to do a complicated analysis based on read sizes and multimapping and I can't really do that in a bash shell. Doesn't the command above just output a single positoin

ADD REPLY • link 4.7 years ago by O.rka ▴ 740