Hello everybody, I have a table a bellow with Chromosome, position and occurence sorted by position.
chr01 8755 2
chr01 8848 15
chr01 8908 1
chr01 8912 2
chr01 8920 1
chr01 9198 19
chr01 9268 19
chr01 11299 1
chr01 11598 1
chr01 11605 1
chr01 11610 1
chr01 11656 1
chr01 11680 1
chr01 11692 1
chr01 11727 1
chr01 11750 1
chr01 11761 9
chr01 11776 4
I would like to get to get automatically the most frequent lowest and highest values : 8848 and 11761 Does anyone have a idea about how can I do that using perl or bash on a linux platform.
Thanks in advance for your help !
You should clarify what you mean by most frequent lowest and highest values. It looks like 1 should be the lowest occurrence value and 19 should be your highest?
I don't understand what are the rules for getting 8848 and 11761. Could you explain why it's not 8755 or 11776? If this is because you take into account occurrence, why it's not 9198 and 9268 ? Thanks for clarification.
Sorry, maybe I was not very clair about that. I would like to select 8848 because even if it's less frequent than 9198 and 9268 it's smallest and most frequent than 8755 (which is the min value). The same logic will be true zith 11761. Please let me know if it's still not clear. Thanks guys
I still don't know what the criteria are for choosing these positions. Why is 8755 the minimum value? You mean minimum base position?
Thanks Damian to take time for this ! the selection must verify tow important criteria the first one is the poisition and the scond is the occuerence of this position. It this case the minimum value (column 2) is 8755. But it's not the most frequent ! So I would like to select 8848 because it's the smallest AND most frequent values in my data. And this will be the same for the bigest values. Hope that I'm more clear.
You should rethink these criteria. Your criteria of smallest by base position and most frequent by occurrence does not allow you to distinguish between 8848 and 9198 unless you weigh one criteria over the other somehow. And if you do choose to weigh position over occurrence, you need to justify why you are doing that.
Like Damian said, you should give some weight. It seems your acceptable range for high or low value in 1000 bp, is that right. and like manu said, why it's not 8755 or 11776? why it's not 9198 and 9268 ? what is that black magic. BTW, do you use R, R may be the best way to do these things than scripting languages.
I have a csv file containing three fields :
I want to filter scaffold with high frequency in respective chromosome in new filtered file..
Any suggestions?
I think you should post it as a separate question & show some data for people to understand what you are asking for.