I have a gene list txt file and need to format it to tab-delimit file.
The list has six columns: chr, startPos,endPos, width, strand, name
The code is awk '{printf("%s\t%s\t%s\t%s\t%s\t%s\t\n",$1,$2,$3,$4,$5,$6)};' input.txt > output.txt
Then I ran some codes, the result said that 'Perhaps you have non-integer starts or ends at line 1?'
I used ' cat -e file |more' to check the lines, each line has addition (tab) $ at the end.
ex. chr12<tab>1234<tab>456<tab>789<tab>+<tab>TP53<tab>$
Please let me know where is wrong. Thanks in advance.
I'm not sure what the error message means, but in your example, there are a few potential problems:
The end coordinate is smaller than the start coordinate.
The name ("ID") generally goes into the fourth column.
The score or some numerical value generally goes into the fifth column.
The strand generally goes into the sixth column.
Most of these issues can be fixed with a few tweaks, like reordering the field variables $2 through $6. If you can post the actual output of running the above on your true input.txt file, that may help figure out what the real issue is.
I removed the '\t' you printed out, the '$' is still at the end of each line. The result like 'chr12<tab>1234<tab>456<tab>789<tab>+<tab>TP53$' now. But I will check each column as you suggested. And see if it works. Thank you very much for your suggestion.
I removed the '\t' you printed out, the '$' is still at the end of each line. The result like 'chr12<tab>1234<tab>456<tab>789<tab>+<tab>TP53$' now. But I will check each column as you suggested. And see if it works. Thank you very much for your suggestion.
The
$
is just a symbol to show the newline character, which is correct. I would recommend checking the order of$2
through$6
.