Entering edit mode
5.1 years ago
zack.henning
•
0
Hey all,
I am trying to find a way to create a scatter plot and histogram using matplotlib for an alignment I generated. I aligned my reads to a bacterial genome and I indexed and sorted the file, and used using:
samtools view -b s_oneidensis_alignemnt_sensitive.sam > alignment.bam
samtools sort alignment.bam > alignment.sorted.bam
samtools index alignment.sorted.bam
samtools depth -a alignment.sorted.bam > pileup.tab
Now I'd like to generate a scatterplot with x-axis = position in genome and y-axis = depth of coverage and then a histogram with x-axis = depth of coverage and y-axis = read count. I'm still new to python and trying to figure out a method using the .tab file or should I use the .bam file? Any help or nudges in the right direction would be greatly appreciated. Thanks!
A similar topic has been discussed in How to plot coverage and depth statistics of a bam file. Tab file (.tab) is just another text file where the columns are tab-separated, read the
pileup.tab
file using pandas and plot using pyplot.So the issue I'm having with this is extracting the alignment .tab file's columns into a list my current code is its indexing the first strings indexed in a row not the column (NCBI ascension ID for the genome) such as A and E with this code: