Hi,
I'm looking for a nice way to calculate the percentage of reads on specifique chromosomes (MT und (1-23,X,Y) und unplaced scaffold).
I know I can get the read counts per chromosome with samtools idxstats
, however I have a lot of bam files and I would like to automate the calculation. The problem is, I'm struggling with basic batch text maniplulation and would therefore appreciate any help you can give me (specific or general).
Edit: I forgot to mention, that I have multiple bam files, which are indexed and only consist of uniquely mapped reads.
Do you have some familiarity with R or python?
Yes, I always work with R
So, can you update your question with where, specifically, you are getting stuck with processing many files with
samtools idxstats
? Ideally, post any code you have tried.I will try to do it with R now. I just thought, there might be a short command in bash :)
You can certainly loop over your files in bash with something like (untested):
I suspect you will still want to "analyze" the data, so R will likely be involved at some point.