How to remove lines with unmatched columns
3
0
Entering edit mode
7.5 years ago
BehMah ▴ 50

Hi All, I have a bed file (annotation) with a bug throughout; the last 2 columns ARE NOT MATCHED in number of blocks ($5 has 4 blocks but $6 has 3) in some rows. How can I remove these lines having this unmatched columns. Thank you guys

  input:

  chr2   1627   4677   +     1,4,92,30    0,19,11
  chr2   2643   6698   +     10,42,9      0,14
  chr3   1327   4377   +     12,32        0,11
  chr4   4143   6698   +     64,43,23     0,24,51

  desired:

   chr3  1327   4377   +     12,32        0,11
   chr4  4143   6698   +     64,43,23     0,24,51
sequence RNA-Seq • 2.2k views
ADD COMMENT
0
Entering edit mode

Dear BehMah. Could you please share with a few more lines of the original file and another snipped with desired result after data is fixed (make it manually). So we can understand exactly what you are looking for. Thank you.

ADD REPLY
0
Entering edit mode

More explenation:

I want to extract sequences of the coordinates but as Exon sizes ($5) are different from exon offsets($6) in numbers, bedtools doesn't give me all the sequences

ADD REPLY
0
Entering edit mode

Thank you all 5heikki ,Petr, jmzeng1314 for your awesome codes

ADD REPLY
1
Entering edit mode
7.5 years ago
jmzeng1314 ▴ 140
perl -alne '{$tmp=tr/,//;print if $tmp %2==0}'  your.input >output
ADD COMMENT
1
Entering edit mode
7.5 years ago
awk '{if(gsub(",","",$5)==gsub(",","",$6)){print $0}}' input.txt

gsub returns number of substitutions it made

ADD COMMENT
1
Entering edit mode
7.5 years ago
5heikki 11k

gsub returns the number of substitutions, so:

awk 'BEGIN{OFS=FS="\t"}{if(gsub(",",",",$5) == gsub(",",",",$6)){print $0}}' inputFile

edit. Petr Ponomarenko suggested the same, however, at least with my gawk his solution deletes the commas from $5 and $6

ADD COMMENT

Login before adding your answer.

Traffic: 1594 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6