Entering edit mode
8.5 years ago
biotech
▴
570
I would like to normalize 'blockCount' field in 'coverage_1_2.txt' bed file, coming from:
bedtools coverage -b MZ123.aligned.q30.bam -a 7a1.bed -counts > coverage_1_2.txt
I provisionally normalized by dividing mapping reads (blockCount) between the size of the interval (protein size).
Note that this is not RNA-seq data. I'm not trying to compare samples gene expression. I mapped DNA vs DNA. Want to know if I have overrepresented sequences in my library.
Here are the headers of the files:
7a1.bed
track name=A. citrulli 7a1 genome description="A. citrulli 7a1 genes" itemRgb=On
Ga0114182_11 1 865 Ga0114182_111 1000 + 1 865 65,105,225
Ga0114182_11 841 2053 Ga0114182_112 1000 - 841 2053 65,105,225
Ga0114182_11 2174 2759 Ga0114182_113 1000 - 2174 2759 65,105,225
Ga0114182_11 2755 3886 Ga0114182_114 1000 - 2755 3886 65,105,225
Ga0114182_11 4008 4587 Ga0114182_115 1000 - 4008 4587 65,105,225
Ga0114182_11 5047 5419 Ga0114182_116 1000 + 5047 5419 65,105,225
Ga0114182_11 5681 6212 Ga0114182_117 1000 + 5681 6212 65,105,225
...
coverage_1_2.txt
Ga0114182_11 1 865 Ga0114182_111 1000 + 1 865 65,105,225 150
Ga0114182_11 841 2053 Ga0114182_112 1000 - 841 2053 65,105,225 290
Ga0114182_11 2174 2759 Ga0114182_113 1000 - 2174 2759 65,105,225 127
Ga0114182_11 2755 3886 Ga0114182_114 1000 - 2755 3886 65,105,225 244
Ga0114182_11 4008 4587 Ga0114182_115 1000 - 4008 4587 65,105,225 173
Ga0114182_11 5047 5419 Ga0114182_116 1000 + 5047 5419 65,105,225 136
Ga0114182_11 5681 6212 Ga0114182_117 1000 + 5681 6212 65,105,225 132
Ga0114182_11 6324 7119 Ga0114182_118 1000 - 6324 7119 65,105,225 216
Ga0114182_11 7153 7771 Ga0114182_119 1000 - 7153 7771 65,105,225 152
...