I'm trying to extract just the number that comes after om4:rightcontent=". I'm trying to figure out how to do that using perl but I can't quite wrap my head around it since I'm pretty much a noob when it comes to perl. As you can see it's just all on a single line with no breaks, and there are hundreds of pages of this... If anyone can figure it out, that would be awesome!
Here below is what the data looks like.
<area class="pPop" shape="rect" om4:leftcontent="Expression value:||Cancer Type:||Legend Value:||Sample Name:" om4:rightcontent="-0.43468||Invasive Ductal Breast Carcinoma||Estrogen Receptor Negative||MB-7066" coords="614,248.75,617,300.2643"><area class="pPop" shape="rect" om4:leftcontent="Expression value:||Cancer Type:||Legend Value:||Sample Name:" om4:rightcontent="-0.35186||Invasive Ductal Breast Carcinoma||Estrogen Receptor Negative||MB-7038" coords="617,248.75,620,294.25985000000003"><area class="pPop" shape="rect" om4:leftcontent="Expression value:||Cancer Type:||Legend Value:||Sample Name:" om4:rightcontent="-0.28862||Invasive Lobular Breast Carcinoma||Estrogen Receptor Negative||MB-7270" coords="620,248.75,623,289.67495"><area class="pPop" shape="rect" om4:leftcontent="Expression value:||Cancer Type:||Legend Value:||Sample Name:" om4:rightcontent="-0.24524||Invasive Ductal Breast Carcinoma||Estrogen Receptor Negative||MB-4758" coords="623,248.75,626,286.5299"><area class="pPop" shape="rect" om4:leftcontent="Expression value:||Cancer Type:||Legend Value:||Sample Name:" om4:rightcontent="-0.21535||Invasive Ductal Breast Carcinoma||Estrogen Receptor Negative||MB-4548" coords="626,248.75,629,284.362875"><area class="pPop" shape="rect" om4:leftcontent="Expression value:||Cancer Type:||Legend Value:||Sample Name:" om4:rightcontent="-0.20532||Invasive Ductal Breast Carcinoma||Estrogen Receptor Negative||MB-5231" coords="629,248.75,632,283.6357"><area class="pPop" shape="rect" om4:leftcontent="Expression value:||Cancer Type:||Legend Value:||Sample Name:" om4:rightcontent="-0.19883||Invasive Ductal Breast Carcinoma||Estrogen Receptor Negative||MB-5441" coords="632,248.75,635,283.165175"><area class="pPop" shape="rect" om4:leftcontent="Expression value:||Cancer Type:||Legend Value:||Sample Name:" om4:rightcontent="-0.18788||Invasive Ductal Breast Carcinoma||Estrogen Receptor Negative||MB-4945" coords="635,248.75,638,282.3713"><area class="pPop" shape="rect" om4:leftcontent="Expression value:||Cancer Type:||Legend Value:||Sample Name:" om4:rightcontent="-0.15737||Invasive Ductal Breast Carcinoma||Estrogen Receptor Negative||MB-7155" coords="638,248.75,641,280.159325">
OK, that seems to work for me, but your text looks like xml, so finding an xml parser to extract the bits of data that you want might be a more effective way.