I am trying to create a matrix of plant traits and plant species. There are 2,912,746 rows in the data and 3 columns. There are different numbers of traits for each species, and not every species has every trait. The data format is tab delimited.
Current format--
Species Trait Value
Species_1 SLA 4
Species_1 Photopath C3
Species_1 Mycorrhiza AMF
Species_2 SLA 3
Species_2 Growth 10
Desired format--
SLA Photopath Mycorrhiza Growth
Species_1 4 C3 AMF
Species_2 3 - 10
I need a perl or python script. Thank you!
Python 2.7.12 using pandas:
In GNU-linux using datamash in bash shell :
Insert - wherever data is not available
datamash is available in most of the linux repos and cross check the output.
Thank you I shall try it