Calculating amount of reads on certain chromosomes in bam file
2
1
Entering edit mode
6.7 years ago
caggtaagtat ★ 1.9k

Hi,

I'm looking for a nice way to calculate the percentage of reads on specifique chromosomes (MT und (1-23,X,Y) und unplaced scaffold).

I know I can get the read counts per chromosome with samtools idxstats, however I have a lot of bam files and I would like to automate the calculation. The problem is, I'm struggling with basic batch text maniplulation and would therefore appreciate any help you can give me (specific or general).

Edit: I forgot to mention, that I have multiple bam files, which are indexed and only consist of uniquely mapped reads.

RNA-Seq bam samtools • 4.5k views
ADD COMMENT
0
Entering edit mode

Do you have some familiarity with R or python?

ADD REPLY
0
Entering edit mode

Yes, I always work with R

ADD REPLY
1
Entering edit mode

So, can you update your question with where, specifically, you are getting stuck with processing many files with samtools idxstats? Ideally, post any code you have tried.

ADD REPLY
0
Entering edit mode

I will try to do it with R now. I just thought, there might be a short command in bash :)

ADD REPLY
1
Entering edit mode

You can certainly loop over your files in bash with something like (untested):

for i in `ls *bai`; do j=`echo $i | sed s/\.bai//`; samtools idxstats $i > $j.idxstats; done

I suspect you will still want to "analyze" the data, so R will likely be involved at some point.

ADD REPLY
2
Entering edit mode
6.7 years ago

General workflow you could try given that you use R.

  1. Look at using dir() to get a list of index files.
  2. Look at using sprintf() to create a command string to run samtools idxstats given a filename (which you have from #1), piping the results to a unique file for each sample.
  3. Look at using system() to run the command string from 2.
  4. Look at using a for loop loop to run #3 over your list of index files from #1.
  5. Define a function to read a single idxstats output file.
  6. Loop using sapply or lapply over all the idxstat output files
  7. Combine output of 6 into a single matrix, with samples in columns and chromosomes in rows.
ADD COMMENT
0
Entering edit mode

Thank you, I will do that

ADD REPLY
3
Entering edit mode
6.7 years ago

I'd suggest to also having a look at mosdepth and indexcov.

ADD COMMENT
1
Entering edit mode

You have the same link for both programs mentioned.

ADD REPLY
0
Entering edit mode

Good catch, fixed it. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1724 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6