Filtering Micrornas Data With Awk
2
0
Entering edit mode
13.0 years ago
Chuangye ▴ 80
chr17    10558721    10558801    ssc-mir-486-1    +    0.475    chr17    10558763    10558830    ssc-mir-486    +

chr17    33406488    33406568    ssc-mir-103-2    -    1.7375    chr17    33406491    33406569    ssc-mir-103a-2    -

chr17    33406488    33406568    ssc-mir-103-2    -    1.7375    chr17    33406499    33406561    ssc-mir-103b-2    +

chr17    40405261    40405341    ssc-mir-499    -    1.9125    chr17    40405237    40405359    ssc-mir-499a    -

chr17    40405261    40405341    ssc-mir-499    -    1.9125    chr17    40405262    40405335    ssc-mir-499b    +

chr17    61587157    61587234    ssc-mir-296    -    0.987012987012987    chr17    61587158    61587236    ssc-mir-296    -

chrX    58683649    58683729    ssc-mir-374b    +    1.725    chrX    58683647    58683717    ssc-mir-374c    -

chrX    58683649    58683729    ssc-mir-374b    +    1.725    chrX    58683647    58683719    ssc-mir-374b    +

The result should be:

chr17    33406488    33406568    ssc-mir-103-2    -    1.7375    chr17    33406499    33406561    ssc-mir-103b-2    +
chr17    40405261    40405341    ssc-mir-499    -    1.9125    chr17    40405262    40405335    ssc-mir-499b    +
chrX    58683649    58683729    ssc-mir-374b    +   1.725   chrX    58683647    58683717    ssc-mir-374c    -

My scripts:

awk 'BEGIN{OFS="\t"}(!($5==$11)&&($6>1))' intersect.txt

Or

awk 'BEGIN{OFS="\t"}($5!=$11 && $6>1)' intersect.txt

And the answer is:

chr17    33406488    33406568    ssc-mir-103-2    -    1.7375    chr17    33406499    33406561    ssc-mir-103b-2    +

chr17    40405261    40405341    ssc-mir-499    -    1.9125    chr17    40405262    40405335    ssc-mir-499b    +

chrX    58683649    58683729    ssc-mir-374b    +    1.725    chrX    58683647    58683717    ssc-mir-374c    -

chrX    58683649    58683729    ssc-mir-374b    +    1.725    chrX    58683647    58683719    ssc-mir-374b    +

So why the scripts couldn't get the right answer ? And how to cope it with unix or python scripts?

unix awk • 2.6k views
ADD COMMENT
0
Entering edit mode

What are you trying to produce? The script right now check to see if the strands are the same and if the 6th column is greater than 1. Why are you expecting only chr17 to be produced?

ADD REPLY
0
Entering edit mode

Thank you! I have corrected the error.

ADD REPLY
3
Entering edit mode
13.0 years ago

If you want the get the lines have where the strands are not the same and where the value in the 6th column is greater than 1.0:

 $ awk -F '     '  '($5!=$11 && $6>1.0)' input.txt 
chr17   33406488    33406568    ssc-mir-103-2   -   1.7375  chr17   33406499    33406561    ssc-mir-103b-2  +
chr17   40405261    40405341    ssc-mir-499 -   1.9125  chr17   40405262    40405335    ssc-mir-499b    +
chrX    58683649    58683729    ssc-mir-374b    +   1.725   chrX    58683647    58683717    ssc-mir-374c    -

I don't know why your script doesn't work.

ADD COMMENT
0
Entering edit mode

Hi Pierre, you are right.Thank you very much!

ADD REPLY
0
Entering edit mode

Using the scripts as awk 'BEGIN{OFS="\t"}($6!=$14 && $8>1)'

or

awk -F ' ' '($6!=$14 && $8>1.0)' could not effective to get the lines their strands are not the same and in which the value in the 6th column is greater than 1.0. such as the data "intersect",which temporarily deposited at http://www.rayfile.com/zh-cn/files/0a846566-1954-11e1-95c1-0015c55db73d/cbbffc68/.

I don't kwnow where is the problem.

ADD REPLY
2
Entering edit mode
13.0 years ago
W Langdon ▴ 90

It appears there may be a problem with assigning OFS inside BEGIN. http://lists.gnu.org/archive/html/bug-gnu-utils/2011-03/msg00006.html

However, why do you want to set OFS to tab? By default gawk will split input lines on white space (which includes tabs).

I have had problems with tabs before. its often safer to either works with defaults or find some other way to parse the input (eg comma separated data).

Bill

ADD COMMENT
1
Entering edit mode

Nice catch on the tabs, I am very surprised by it as well. Letting awk split on any whitespace can actually lead to very surprising outcomes. In general the default split in most programming languages (perl, python) will collapse consecutive whitespaces and treat them as a single separator. Therefore empty, tab separated columns will shift subsequent columns. Once you are bitten by one of the devious tab shifting default you never rely on them again.

ADD REPLY

Login before adding your answer.

Traffic: 2261 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6