per sample mean coverage and standard deviation
1
0
Entering edit mode
4.5 years ago

Hello everyone

I have 18000 text files containing depth for each position in sample. each text file corresponds to 1 sample (total 18000 sample). I wanted to get the mean coverage, standard deviation and total count of positions per sample in a single output file. I was just wondering if there is an easy way to do it? depths were calculated using samtools depth input.bam. all the text files looks like this...

sample_name   chromosome   position     depth

so the desired output is..

sample1  mean_depth   standard_deviation   total_number_of_positions
sample2  mean_depth   standard_deviation   total_number_of_positions
sample1  mean_depth   standard_deviation   total_number_of_positions
bash awk • 1.5k views
ADD COMMENT
0
Entering edit mode

Are the solutions in your last question not suitable : average depth across samples

ADD REPLY
0
Entering edit mode

this is a different question so I thought of asking it in a different post. there I wanted average depth per position across all the samples. here I need per sample average depth, sd and counts for total number of positions.

ADD REPLY
0
Entering edit mode

You can probably calculate that using some variation of datamash solution that was posted by @cpad0112 in last question. Tinkering with things is a great way to learn.

You should also validate answers for your past questions, if they helped you solve the issue (green check mark besides answers). You can accept more than one if they all work.

ADD REPLY
0
Entering edit mode

ok thank you so much for the information.

ADD REPLY
1
Entering edit mode
4.5 years ago
husensofteng ▴ 410

If I understand correctly, you want the overall quants across all positions per sample. In such case, assuming that the depth values are in the fourth column and all .txt files are in the directory:

for sample_file in *.txt; do 
   awk 'BEGIN{OFS="\t"}{x+=$4; y+=$4^2}END{print $1,x/NR,sqrt(y/NR-(x/NR)^2),NR}' $sample_file >> output.txt; 
done
ADD COMMENT

Login before adding your answer.

Traffic: 1916 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6