Following the discussion a previous discussion (https://www.biostars.org/p/171557/), I would like to prepare a file for converting chr:pos to rs. For this, I have downloaded a list of all SNPs from UCSC Table Browser (https://genome.ucsc.edu/cgi-bin/hgTables). It looks like this:
#chrom chromEnd name
chr1 100663297 rs1235665665
chr1 1048577 rs1346354302
chr1 62914560 rs538775156
I would like it to look like this:
1:100663297 rs1235665665
1:1048577 rs1346354302
1:62914560 rs538775156
I could do it in R; however, the file is so large that it crashes my computer before it even loads. I was told that Linux commands such as grep
and awk
are amazing for such things, and can handle very large file efficiently. Unfortunately, I do not have a slightest idea of how even to begin writing the code in Linux to achieve my goal.
Could you please help me with this?
Thank you very much.
PS I am unsure whether the title to my question is efficient in describing my problem. Please let me know if it is not and I will edit it.
This is really basic question. You should starting searching the web for how
awk
works. Have a look for examples at this tutorial side.Of course I could give you the solution. But then you will not learn that much :)
Edit : Along to finswimmer comment, here are some other links
Hello OAJn8634
Please take a look at the
man
ofawk
,sed
,grep
,cut
to see how to use it. Give us your best try and we will take a look how to modify it to fit your attentAwk in Bioinformatics
https://stackoverflow.com/questions/29275971/need-to-remove-the-string-chr-and-the-sign-from-the-file
Hello Bastien Hervé and finswimmer, Thank you very much for the very useful links on how get me started, and for offering to help. I really appreciate it.