Question

Script for extracting atomic position of nucleotide base

0

Entering edit mode

9.7 years ago

vahapel ▴ 210

Hi everyone.

I have a tab-delimited tabular file (indicated below) including information about the atomic positions of nucleotide bases. My question is that how can I get first 10 lines of every 20000 lines in a datasheet has 10^7 lines. Basically, is there any script for such a purpose?

BaseAtomNumber        atomic distances    NumberofNeighbour    IndexofAtom
1                     1.94895             655                  153   
1                     2.34545             566                  543
..
..

Many thanks in advance for your help!

next-gen Assembly genome • 1.8k views

ADD COMMENT • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by vahapel ▴ 210

Ram · Accepted Answer · 2015-03-31

2

Entering edit mode

9.7 years ago

george.ry ★ 1.2k

Assuming your files have a single line header that needs stripping first, as shown, then something like:

tail -n+2 <yourfile> \
| split -l 20000 - <yourprefix> \
&& find <yourprefix>* -exec bash -c 'head -n10 {}' \; \
> <youroutfile> \
&& rm <yourprefix>*

Strips the header, splits the file into separate files of size 20k lines, takes the top 10 rows of each to an output file and then deletes the intermediate files afterwards (make sure nothing else shares <yourprefix>*, or it'll be deleted too).

ADD COMMENT • link updated 2.5 years ago by Ram 44k • written 9.7 years ago by george.ry ★ 1.2k

0

Entering edit mode

We tried this and it works well, thanks for your help.

ADD REPLY • link 9.7 years ago by vahapel ▴ 210