Hi,
I need some advice. I have output file (count file) from VCF format. It looks like this:
Chr 10
protein_coding 447164
pseudogene 87457
Chr 11
protein_coding 368825
pseudogene 78131
Chr 12
protein_coding 357596
pseudogene 68176
and there are more chromosomes. I have two others files with another column names (but could differ with 1 or more fields between chromosomes). How can I convert that file to CSV or another file format. I mean, I want to create file like this:
Chr,protein_coding,pseudogene
10,447164,87457
11,368825,78131
12,357596,68176
Assuming that if some chromosome does not has for example pseudogene, than script will put empty field, e.g. for 15 chromosome:
15,132598,
Thank you in advance
Thank you for explaining the problem at hand so well.
What have you tried by yourself to solve this problem? How far did you get and what specific challenges are you facing?
I have no idea, how can I do that...
You can use
awk
withRS
=Chr
. That will useChr
to create records, so each record would contain all data between consecutiveChr
s. You can then replace each new line by a space and use$1
,$2
etc to get to your result.Also, please see the following oddities:
protein_coding
becomesprotein_codin
pseudogene
. Why?Chr
=16
Chr
(the key) from the chromosome number (the value). Is this true?These will make a difference in the final script you develop.
sorry, my bad. I have corrected everything.