hi all,
I have a python dataframe that is based on a distance matrix of the 1000 genomes. the rows and the columns are the sample ID in the same order, with the value representing the distance between the samples.
a sample of the dataframe is as below:
> 10174 10187 10205 10215 10227 10231 10249 10347 10411 10490
> 0 0.000000 0.069211 0.067786 0.068593 0.068817 0.068341 0.067894 0.069827 0.068312 0.067571
> 1 0.069211 0.000000 0.069832 0.070054 0.070337 0.068410 0.069597 0.071458 0.069664 0.069361
> 2 0.067786 0.069832 0.000000 0.069795 0.070234 0.069094 0.068961 0.070510 0.069114 0.069188
> 3 0.068593 0.070054 0.069795 0.000000 0.069213 0.069364 0.068045 0.069976 0.068899 0.068610
> 4 0.068817 0.070337 0.070234 0.069213 0.000000 0.069265 0.066743 0.069880 0.068370 0.068147
the actual file has over 2000 samples with all the sample IDs in the header as column names.
now my question is how do I select a group of values for each sample based on conditions. like for example, for sample 10174 what are all the samples with values above 0.068 and below 0.07? in the example below it will be 10187, 10215, and 10227. while it will not select 10174 itself as the value is 0.
as I need to do this for each of the 2000 samples, I will not be able to do it manually.
I am assuming that this will have to be a loop on the column values, but I am not sure how to write the code for a data frame and return the sample names with the values fulfilling the condition.
thank you so much for the help in advance.
thank you so much. i am playing around with it. it should work. but can you please tell me how to just get the text values for all the columns fitting the condition without having to put it in a python list ?
Can you please specify what exactly you mean by that? You want the names of the columns (samples) or the whole columns?
i want the text for the name of the columns. like for example, given the results
i want 10174, 10187, 10205, 10215 as the output. These are the samples that the condition applies for. not in a python list, but in a string.
i know i could use the t0_string function, but in the above case i will have to use it for each list and i dont know how to loop through them.