Entering edit mode
4.4 years ago
O.rka
▴
740
I have a mapping with an insane amount of multimapped reads. I'm trying to figure out all the locations these multimapped reads have mapped to but I'm having difficult accessing all of the locations from the bam file. I got my BAM file from STAR aligner.
Here's my STAR command:
conda activate star_env
STAR --genomeDir star_index_genome_ercc92_virus --readFilesCommand zcat --readFilesIn preprocessing/kneaddata_repaired_1.fastq.gz preprocessing/kneaddata_repaired_2.fastq.gz --outFileNamePrefix intermediate/star_output/star_ --runThreadN 2 --outReadsUnmapped Fastx
samtools view -h -b intermediate/star_output/star_Aligned.out.sam > intermediate/star_output/star_Aligned.out.bam
I'm iterating through my bam file but I can't find where all of the multimapping locations are?
for fp in sy.pv(sizes.index, "BAM Files"):
bamfile = pysam.AlignmentFile(fp, "rb")
for record in bamfile:
number_of_loci = record.get_tag("NH")
if number_of_loci > 1:
break
Here's the results:
record.get_tags()
#[('NH', 6), ('HI', 1), ('AS', 299), ('nM', 0)]
record.reference_name, record.reference_start, record.reference_end
#('000181F_arrow', 3618, 3769)
Why isn't record.reference_start
and record.reference_end
a tuple of 6 positions?
why not using a simple samtools, awk ? Like
samtools view in.sorted.bam | awk '{ if ($5<255) print }' | cut -f3,4
I'm trying to do a complicated analysis based on read sizes and multimapping and I can't really do that in a bash shell. Doesn't the command above just output a single positoin