Entering edit mode
3.2 years ago
Laura
▴
50
Hi,
I have a specific set of 1,009 elements in a bed file that I am interested in. I also have bam files which I would like to process to know the number of reads for these specific elements (for comparison purposes). I understand some simple uses of samtools commands, like:
samtools view -c file.bam
But I'm looking for a way to search for specific elements in a bed file. I have:
import argparse
import pysam
import pandas as pd
bamfile = "/mnt/d/Axiotl/ABC/Input/bam/ENCFF384ZZM.bam"
bed_file = "/mnt/d/R/abc_3a_scaffold_hg19.bed"
output = "/mnt/d/Axiotl/ABC/Output/bam"
genome_sizes = "/mnt/d/Axiotl/ABC/Working_Copy/ABC-Enhancer-Gene-Prediction-master/reference/chr_sizes"
def count_bam(bamfile, bed_file, output, genome_sizes, use_fast_count=True, verbose=True):
reads = pysam.AlignmentFile(bamfile)
read_chrs = set(reads.references)
bed_regions = pd.read_table(bed_file, header=None)
bed_regions = bed_regions[bed_regions.columns[0:3]]
bed_regions.columns = "chr", "start", "end"
counts = [(reads.count(row.chr, row.start, row.end) if (row.chr in read_chrs) else 0) for _, row in bed_regions.iterrows()]
bed_regions['count'] = counts
bed_regions.to_csv(output, header=None, index=None, sep="\t")
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--bamfile', help="input bam file")
parser.add_argument('--bed_file', help="input bed file")
parser.add_argument('--genome_sizes', help="genome sizes")
parser.add_argument('--output', help="output directory")
args = parser.parse_args()
print(args)
print(bamfile)
print(bed_file)
print(output)
print(genome_sizes)
try:
count_bam(bamfile, bed_file, output, genome_sizes)
except KeyboardInterrupt:
exit
But I keep getting the error:
ValueError: Length mismatch: Expected axis has 1 elements, new values have 3 elements
I've tried absolutely everything I can think of. Any help would be greatly appreciated!