Question

Grepping from a specific column with pattern list

0

Entering edit mode

7.0 years ago

waqasnayab ▴ 250

Hi,

I have a pattern file with IDs:

I want to grep lines using this pattern file from a big file. In big file, the IDs are present in line two. The dummy format of big file is as follows:

FID IID some_more_columns
fam1 1
fam2 10
fam3 1098
fam4 256
fam5 1099

The desired output should be:

FID IID some_more_columns
fam1 1
fam2 10
fam3 1098
fam5 1099

I tried with this solution: http://www.linuxforums.org/forum/programming-scripting/130889-grepping-something-out-specific-column-file-using-pattern-another-file.html

but no luck, any advice is appreciated....

Thanks,

Waqas.

sequence next-gen • 5.0k views

ADD COMMENT • link updated 7.0 years ago by cpad0112 21k • written 7.0 years ago by waqasnayab ▴ 250

3

Entering edit mode

7.0 years ago

michael.ante ★ 3.9k

Hi Waqas,

you can use the join command to get all lines which have a common field. Let t1.txt be your ID list and t2.txt your big file. (I just added some dummy values in column 3):

join -1 1 -2 2 -o 2.1,2.2,2.3  <(sort t1.txt) <(sort -k2,2 t2.txt )
fam1 1 A
fam2 10 A
fam3 1098 A
fam5 1099 G
FID IID some_more_columns

Since join needs sorted columns, you can produce temp-files with <(sort ) . The parameter -1 selects the field to join from file 1 -2 the field of file 2. With -oyou control the output: for each field x you want to have in your result, you need to add the 2.x to the list.

Afterwards, you can re-sort the results.

Cheers,

Michael

ADD COMMENT • link 7.0 years ago by michael.ante ★ 3.9k

0

Entering edit mode

Thanks Micehal for that, I will keep the solution into my wish list somewhere,

ADD REPLY • link 7.0 years ago by waqasnayab ▴ 250

1

Entering edit mode

7.0 years ago

cpad0112 21k

output:

$  awk 'FNR==NR{a[$1]++;next}a[$2]' test1.txt test.txt 

FID IID
fam1    1
fam2    10
fam3    1098
fam5    1099

Input:

$ cat test1.txt 
IID
1
10
1098
1099
11
1130
12
121
127

$ cat test.txt 
FID IID
fam1    1
fam2    10
fam3    1098
fam4    256
fam5    1099

ADD COMMENT • link 7.0 years ago by cpad0112 21k

score 5 · Accepted Answer · 2017-11-16

5

Entering edit mode

7.0 years ago

Kevin Blighe 88k

cat lookup.list 
IID
1
10
1098
1099
11
1130
12
121
127

cat BigFile.list 
FID   IID   moredata  masdados
fam1  1     moredata  masdados
fam2  10    moredata  masdados
fam3  1098  moredata  masdados
fam4  256   moredata  masdados
fam5  1099  moredata  masdados


awk 'BEGIN {FS=" "} FNR==NR {key=$1; arrayLookup[key]=$1; next} {key=$2; if (arrayLookup[key]) print $0}' lookup.list BigFile.list 
FID   IID   moredata  masdados
fam1  1     moredata  masdados
fam2  10    moredata  masdados
fam3  1098  moredata  masdados
fam5  1099  moredata  masdados

ADD COMMENT • link 7.0 years ago by Kevin Blighe 88k

0

Entering edit mode

Hi Kevin,

I tired but nothing happened, than I tried by changing the {FS=" "} to {FS="\t"} but same result.

Although worked well on test file.,,,,,,,,,,,,

ADD REPLY • link 7.0 years ago by waqasnayab ▴ 250

1

Entering edit mode

Ensure that your files are delimited properly. You can convert all multiple whitespace to a single whitespace by running sed 's/ \+/ /g' on each file prior to using awk.

ADD REPLY • link 7.0 years ago by Kevin Blighe 88k

0

Entering edit mode

Yes, in my file there is a special character ^M a classical problem dos2unix than your command worked for me, now its all ok......,,,,,,,!!!!!!!

Perfect,

Thanks,

Waqas,

ADD REPLY • link 7.0 years ago by waqasnayab ▴ 250

0

Entering edit mode

Yes, I have encountered that problem before with ^M. dos2unix and unix2dos are useful tools to have.

Good luck

ADD REPLY • link 7.0 years ago by Kevin Blighe 88k

0

Entering edit mode

Which OS are you using?

ADD REPLY • link 7.0 years ago by Kevin Blighe 88k