chr17 10558721 10558801 ssc-mir-486-1 + 0.475 chr17 10558763 10558830 ssc-mir-486 +
chr17 33406488 33406568 ssc-mir-103-2 - 1.7375 chr17 33406491 33406569 ssc-mir-103a-2 -
chr17 33406488 33406568 ssc-mir-103-2 - 1.7375 chr17 33406499 33406561 ssc-mir-103b-2 +
chr17 40405261 40405341 ssc-mir-499 - 1.9125 chr17 40405237 40405359 ssc-mir-499a -
chr17 40405261 40405341 ssc-mir-499 - 1.9125 chr17 40405262 40405335 ssc-mir-499b +
chr17 61587157 61587234 ssc-mir-296 - 0.987012987012987 chr17 61587158 61587236 ssc-mir-296 -
chrX 58683649 58683729 ssc-mir-374b + 1.725 chrX 58683647 58683717 ssc-mir-374c -
chrX 58683649 58683729 ssc-mir-374b + 1.725 chrX 58683647 58683719 ssc-mir-374b +
The result should be:
chr17 33406488 33406568 ssc-mir-103-2 - 1.7375 chr17 33406499 33406561 ssc-mir-103b-2 +
chr17 40405261 40405341 ssc-mir-499 - 1.9125 chr17 40405262 40405335 ssc-mir-499b +
chrX 58683649 58683729 ssc-mir-374b + 1.725 chrX 58683647 58683717 ssc-mir-374c -
My scripts:
awk 'BEGIN{OFS="\t"}(!($5==$11)&&($6>1))' intersect.txt
Or
awk 'BEGIN{OFS="\t"}($5!=$11 && $6>1)' intersect.txt
And the answer is:
chr17 33406488 33406568 ssc-mir-103-2 - 1.7375 chr17 33406499 33406561 ssc-mir-103b-2 +
chr17 40405261 40405341 ssc-mir-499 - 1.9125 chr17 40405262 40405335 ssc-mir-499b +
chrX 58683649 58683729 ssc-mir-374b + 1.725 chrX 58683647 58683717 ssc-mir-374c -
chrX 58683649 58683729 ssc-mir-374b + 1.725 chrX 58683647 58683719 ssc-mir-374b +
So why the scripts couldn't get the right answer ? And how to cope it with unix or python scripts?
What are you trying to produce? The script right now check to see if the strands are the same and if the 6th column is greater than 1. Why are you expecting only chr17 to be produced?
Thank you! I have corrected the error.