This is a textbook of example of one of the reasons perl was created. If your file is completely regular you could write a few lines of perl to loop through the file and do something whenever it encounters a line starting with a number. For instance:
#!/usr/bin/perl
print "Pos\tA\tC\tG\tT\n";
while(<>){
chop;
if(/^\d+/){
$number = $_;
@values = ($number);
<>;
for($i=0; $i<4;++$i){
$_ = <>;
chop;
($base,$value) = split();
push(@values,$value);
}
print join("\t", @values), "\n";
}
}
The code above would work to parse your little snippet, and format it the way you've shown above. But it assumes your file is structured in a completely regular way. If the code above were in a file called parse.pl, and your data was in a file called foo.txt, you would call it like so:
./parse.pl foo.txt
and to dump the results to a new file:
./parse.pl foo.txt > newfile.txt
If you're unfamiliar with perl, here's what's happening: print a header line like you have above, then loop through the file one line at a time, the <> symbols grab a line from the file and place it into a variable called: $_. The chop function cuts off the last character of the line (the "newline"). The if statement tests to see if the line begins with 1 or more digits (many functions like chop, split, pattern matching, etc. operate on $_ implicitly unless another variable is handed to them explicitly). If the line begins with digits, remember the digit, and start a list of values. Grab the next line, which should be empty, and don't save it to anything (thus discarding it). Then set up a loop to process the next four lines: remove the end character, split each line by white space saving the values, and push each value onto the list of values that was created previously. After 4 lines, print the contents of the list, joined by a tab character, followed by a newline. Repeat until there are no more lines in the file!
There are a variety of ways to solve your problem. An awk solution would also be easy to code. But with a few principles from perl that could be learned in an afternoon or two, you can reshape your file. (some gurus might find the code above cringe worthy, but it gets the job done).
No solution is overkill or readable if one knows of no other solution or language. I think we can safely assume mphillips6789 knows neither awk nor perl (I did mention awk as a possibility in my response). For the edification of those who know neither, the notion of readability is interesting, and they shouldn't miss the common elements, the idea of using // to specify patterns to match by line, {} to hold blocks of code, and putting things in variables starting with $.
Thank you, problem solved. Looking at both solutions was educational in and of itself.