Hi All,
I have an excel table with 3 tab-delimited columns and n rows like this:
**header marker cm**
101 4 0-7.195
103 8 38.582-49.653
103 5 43.096-46.534
103 1 49.653-49.653
103 1 51.676-51.676
104 2 22.454-37.061
104 4 23.351-37.061
105 2 83.619-84.178
106 1 36.307-36.307
106 1 40.62-40.62
I need an output like this:
**header marker cm**
101 4 0-7.195
103 15 38.582-51.676
104 6 22.454-37.061
105 2 83.619-84.178
106 2 36.307-40.62
So, I need to collapse the duplicates in the header column, keeping the sum of the markers obtained from every row of that duplicate, and extract from the cm column the lowest and the highest value among all the rows of that duplicate.
For example, for the duplicate 103 the sum is 8 + 5 + 1 + 1, the lowest cm value is 38.582 in the 2nd row and the highest cm value is 51.676 in the 5th row. In the table there are also not duplicate headers (like the 101) that have to be keep directly as they are.
Any idea for an automated solution?
Thank you
Hello,
This post looks off-topic for the site. Please substantiate how this question is related to bioinformatics.
Yes I'm going to explain it better. The "header" column represents my sequences, the "marker" column represents the number of molecular markers that I found on the sequences of the "header" column, the "cm" column represents the range position in centimorgan of these markers on my sequences.
Please edit your question and add this information in there.