In Granges there's a function mcol to access the columns describing various quantities under observation apart from the 'Canonical' position columns ( like "chrom", "start", "end" +optional "strand" etc. ). I'd like to try to access these entities in pyranges as well. For concreteness, take the following example: say I have the following pyranges object:
>>> test_pr
+--------------+-----------+-----------+-------------+-------------+------------+
| Chromosome | Start | End | sample_1 | sample_2 | sample_3 |
| (category) | (int32) | (int32) | (float64) | (float64) | (float64) |
|--------------+-----------+-----------+-------------+-------------+------------|
| chr1 | 10468 | 10470 | nan | 0.1234 | nan |
| chr1 | 10470 | 10472 | 0.714 | 0.8 | 0.12 |
| chr1 | 10483 | 10485 | nan | 0.6 | 0.13 |
| chr1 | 10488 | 10490 | 0.941 | 0.8 | 0.15 |
+--------------+-----------+-----------+-------------+-------------+-------------+
And my task is to take averages at each position across samples. Hence, I'd like to access just the last 3 columns and obtain the following:
>>> output_pr
Chromosome Start End Average
chr1 10468 10470 0.1234
chr1 10470 10472 0.544667
chr1 10483 10485 0.365
chr1 10488 10490 0.630333
so I want something like test_pr.mean(skipna=True)
, but if I try to access the last three columns with, say, iloc for example, I get this:
>>> test_pr.iloc[:,[3,4,5]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../pyranges.py", line 270, in __getattr__
return _getattr(self, name)
File ".../pyranges/methods/attr.py", line 66, in _getattr
raise AttributeError("PyRanges object has no attribute", name)
AttributeError: ('PyRanges object has no attribute', 'iloc')
So I understand that pyranges are not standard pandas dataframes, and I can't use iloc, but I'm not sure how else to manipulate the metadata columns and collect statistics, etc. Solving the above example case would, I think, make it clear how to work with this type of data structure in general.
No worries. Thanks for the share on the solution (and to Endre), that's a really clean approach for taking advantage of pandas's built-in methods, will come in handy for me too!