How can I extract the rows which are numbered of a big file?
2
0
Entering edit mode
6.6 years ago
hosin • 0

I have a big file like this ( (included 6 columns):

Name                                   Chr                Position                GType   LRR      BAF
250506CS3900140500001_312.1  23 26298017    BB  0.004256991    -0.0254199                      1
250506CS3900176800001_906.1  7  81648528    BB  0.05812091  0.996112
250506CS3900211600001_1041.1 16 41355381    BB     -0.1070691   0.9926475
250506CS3900218700001_1294.1    2   148802744   BB      -0.06002647 0.9837347
250506CS3900283200001_442.1 1   62646307    AB  0.0280207   0.4966125
250506CS3900371000001_1255.1    11  35339124    BB  0.05070077  1
250506CS3900386000001_696.1 16  62646307    AB  0.0280207   0.4966125
250506CS3900487100001_1521.1    14  1110363         AB  0.0893564   0.5164082
250506CS3901300500001_1084.1    7   89431547    BB  0.008588651 1
OAR3_7444330.1                  3   26298017    BB  0.004256991    -0.0254199     
OAR3_74471615.1                 3   41355381    BB     -0.1070691   0.9926475
OAR3_74485418_X.1           5       1110363         AB  0.0893564   0.5164082
OAR3_74546684.1                 3   89431547    BB  0.008588651 1
OAR3_74587791.1                 3   26298017    BB  0.004256991    -0.0254199 
OAR3_74604120.1                 3   62646307    AB  0.0280207   0.4966125
OAR3_74642696.1                 3   62646307    AB  0.0280207   0.4966125
OAR3_74703774.1                 3   148802744   BB      -0.06002647 0.9837347
OAR3_74732440.1                 3   81648528    BB  0.05812091  0.996112

also I have list file like this (included one column):

250506CS3900283200001_442.1
250506CS3900386000001_696.1
250506CS3900371000001_1255.1
250506CS3900487100001_1521.1
OAR3_74546684.1
OAR3_74604120.1 
OAR3_74703774.1

How can I extract the rows which are numbered in list file above? . Please help me. I'd be really grateful if I can commands or ...

genome • 1.9k views
ADD COMMENT
2
Entering edit mode
6.6 years ago

this is basic linux: https://linux.die.net/man/1/join

join -t $'\t' -1 1 -2 1 <(sort   -t $'\t' -k1,1 file1.txt) <(sort   -t $'\t' -k1,1 file2.txt)
ADD COMMENT
0
Entering edit mode

Yes I did many try, like this: join -t $'\t' -1 1 -2 1 <(sort -t $'\t' -k1,1 440s.txt) <(sort -t $'\t' -k1,1 440.txt)> Output . But output file is empty

ADD REPLY
0
Entering edit mode

you're doing something wrong, or you're not using bash, or the delimiter is not a tabulation

ADD REPLY
1
Entering edit mode
6.6 years ago
CS ▴ 10

you can try sorting your files on first column and do

grep -f smallFile BigFile2 > output.txt
ADD COMMENT
1
Entering edit mode

why would you need to sort ? what would happen if they key is present in another column ?

ADD REPLY
0
Entering edit mode

Sorting can speed up things:

time grep -f sorted_T1 ../BWA/ERR1094807.sam > Output

real 1m12.685s user 0m6.970s sys 0m7.441s

time grep -f unsorted_T1 ../BWA/ERR1094807.sam > Output

real 1m16.928s user 0m7.176s sys 0m7.914s

And yes, you are right if the key is present in any other column grep -f would pick up that line. I thought it was not the case in this example.

ADD REPLY
0
Entering edit mode

Thank you very much for your attention I'm working by shell . Actully my system does not respond by this method and each file has too much rows ( about 600K) . and I have 500 files such as first file( with 6 column and 600 rows). This commands take a lot of time from me , so do you have another suggestion?

ADD REPLY
0
Entering edit mode

did you try the join method ?

ADD REPLY
0
Entering edit mode

Yes I did , Actually after that I have a empty file

  join -t $'\t' -1 1 -2 1 <(sort   -t $'\t' -k1,1 440s.txt) <(sort   -t $'\t' -k1,1 440.txt)> Output

 440s.txt: is file 1 (big file with 6 column)

 440.txt: is file 2(small file with 1 column)
 So output is empty
ADD REPLY
0
Entering edit mode

If you are concerned about speed, you should add -F.

-F
--fixed-strings
Interpret the pattern as a list of fixed strings (instead of regular expressions), separated by newlines, any of which is to be matched.
ADD REPLY

Login before adding your answer.

Traffic: 1979 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6