Hi everyone.
I want to simplify the names of the samples of a VCF file by removing the suffix that was added during sequenciation but i am not being successful. Right now my samples are named like this:
123_4:A7TT6AKXX:1:1000000001 or ABC100:A7TT6AKXX:1:1000000001
In the first case I want to keep only the number before the underscore and in the second everything before the first colon (:). How can I do it?
Here's how the header is organized in my vcf file:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 102_3:C6RL4ANXX:1:250529518 206E:C6RL4ANXX:2:250529583 AN07:C6RL4ANXX:2:250529585 B103:C6RL4ANXX:2:250529647 CA:C6RL4A
Thanks in advance.
It seems that
bcftools reheader --samples <your file> -o <output> <your input file>
would be helpful.I've moved this to a comment as this has been added as an answer already.
What have you tried? People will be more to help if you show the effort you put into this, trying to solve your question. We rather point out mistakes than giving a full answer (which in the long run doesn't learn you a lot).
could you please post example VCF line containing your sample names?
Thank you all for replying!
Seems that I failed to explain myself on the original post. The goal is to change the names of my samples like this: 123_4:A7TT6AKXX:1:1000000001 > 123
ABC100:A7TT6AKXX:1:1000000001 > ABC100
Eventually I was able to change them using bcftools reheader* - like Pierre suggested - but I had to extract samples names from the original vcf and proceed to change it manually.
There is no other way to change it directly on the vcf maybe with grep or sed commands?
Thank you.
123_4:A7TT6AKXX:1:1000000001 > 123 ABC100:A7TT6AKXX:1:1000000001 > ABC100
Requested you to post your VCF header (with samples). with the information provided in OP,
input (two columns with case1 and case2):
output:
However for code to work on OP VCF, VCF header with sample names is necessary (with few example sample names)
Added header but not sure if in the right format.
I'll try your solution and post if it worked. Thank you!
Input:
code:
output:
It works, cpad0112! Thank you for your precious help.
Thanks. Next time, please post example input and expected output. This would help forum better to answer your queries fast.