Change sample names in vcf
1
5
Entering edit mode
7.2 years ago
leeandroid ▴ 130

Hi everyone.

I want to simplify the names of the samples of a VCF file by removing the suffix that was added during sequenciation but i am not being successful. Right now my samples are named like this:

123_4:A7TT6AKXX:1:1000000001 or ABC100:A7TT6AKXX:1:1000000001

In the first case I want to keep only the number before the underscore and in the second everything before the first colon (:). How can I do it?

Here's how the header is organized in my vcf file:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  102_3:C6RL4ANXX:1:250529518     206E:C6RL4ANXX:2:250529583      AN07:C6RL4ANXX:2:250529585     B103:C6RL4ANXX:2:250529647      CA:C6RL4A

Thanks in advance.

vcf sample name • 38k views
ADD COMMENT
6
Entering edit mode

It seems that bcftools reheader --samples <your file> -o <output> <your input file> would be helpful.

ADD REPLY
0
Entering edit mode

I've moved this to a comment as this has been added as an answer already.

ADD REPLY
0
Entering edit mode

but i am not being successful.

What have you tried? People will be more to help if you show the effort you put into this, trying to solve your question. We rather point out mistakes than giving a full answer (which in the long run doesn't learn you a lot).

ADD REPLY
0
Entering edit mode

could you please post example VCF line containing your sample names?

ADD REPLY
0
Entering edit mode

Thank you all for replying!

Seems that I failed to explain myself on the original post. The goal is to change the names of my samples like this: 123_4:A7TT6AKXX:1:1000000001 > 123

ABC100:A7TT6AKXX:1:1000000001 > ABC100

Eventually I was able to change them using bcftools reheader* - like Pierre suggested - but I had to extract samples names from the original vcf and proceed to change it manually.

There is no other way to change it directly on the vcf maybe with grep or sed commands?

  • bcftools reheader -s new_names.txt old.vcf > new.vcf

Thank you.

ADD REPLY
1
Entering edit mode

123_4:A7TT6AKXX:1:1000000001 > 123 ABC100:A7TT6AKXX:1:1000000001 > ABC100

$ echo 123_4:A7TT6AKXX:1:1000000001 |awk 'gsub ("_.*","")'
123
$ echo ABC100:A7TT6AKXX:1:1000000001 |awk 'sub (":.*","")'
ABC100

Requested you to post your VCF header (with samples). with the information provided in OP,

input (two columns with case1 and case2):

$ cat test.txt 
123_4:A7TT6AKXX:1:1000000001    ABC100:A7TT6AKXX:1:1000000001

output:

$ awk '{gsub("_.*\t","\t"); gsub(":.*","\t"); print}' test.txt 
123 ABC100

However for code to work on OP VCF, VCF header with sample names is necessary (with few example sample names)

ADD REPLY
0
Entering edit mode

Added header but not sure if in the right format.

I'll try your solution and post if it worked. Thank you!

ADD REPLY
1
Entering edit mode

Input:

CHROM   POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  102_3:C6RL4ANXX:1:250529518 206E:C6RL4ANXX:2:250529583  AN07:C6RL4ANXX:2:250529585  B103:C6RL4ANXX:2:250529647  CA:C6RL4A

code:

$ awk -F[:_] 'NR==1{$1=$1}1' OFS=_ test.txt  | awk '{for(N=1; N<=NF; N++) sub(/_.*/, "", $N)}1'

output:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 102 206E AN07 B103 CA
ADD REPLY
0
Entering edit mode

It works, cpad0112! Thank you for your precious help.

ADD REPLY
4
Entering edit mode

Thanks. Next time, please post example input and expected output. This would help forum better to answer your queries fast.

ADD REPLY
16
Entering edit mode
7.2 years ago

bcftools reheader https://samtools.github.io/bcftools/bcftools.html

-s, --samples FILE

new sample names, one name per line, in the same order as they appear in the VCF file. Alternatively, only samples which need to be renamed can be listed as "old_name new_name\n" pairs separated by whitespaces, each on a separate line. If a sample name contains spaces, the spaces can be escaped using the backslash character, for example "Not\ a\ good\ sample\ name".

ADD COMMENT

Login before adding your answer.

Traffic: 2004 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6