Vcf file sorting
1
0
Entering edit mode
3.3 years ago
Lukas • 0

I got vcf file from my instructor. It is VEP annoted with over 50 options separated by ||. I noticed that the vcf is not arrange to appropriate columns so I decided to sort it.

I used this code to sort my vcf file according position:

$ grep "^#" input.vcf > output.vcf
$ grep -v "^#" input.vcf| sort -k1,1V -k2,2g >> output.vcf

However after I used it I expected to have output.vcf data sorted into columns. Instead all data of each variant data is still shift.

Am I doing something wrong? Is it different way to arrange vcf into columns?

Vcf • 2.7k views
ADD COMMENT
0
Entering edit mode

Guys I am so sorry if this question is inappropriate but I started using linux half year ago.

ADD REPLY
3
Entering edit mode
3.3 years ago
Ram 44k

Rather than reinvent the wheel, use existing tools - bcftools sort, for example. Also, see this thread: https://bioinformatics.stackexchange.com/questions/6826/sort-vcf-by-contig-and-position-within-contig

Maybe -k2,2n would work better than -k2,2g in your solution.

ADD COMMENT
0
Entering edit mode

i really dont know why is that but my vcf is only sorted, when i open it with nano. But if i used less, more or bcftools view it is still shifted.

ADD REPLY
0
Entering edit mode

Can you show us the exact commands you're using? Are you sure that the file sort order is weird in the less/more case and it's not happening because of display issues where the tabs don't always line up?

ADD REPLY
0
Entering edit mode

It didn't even cross my mind. I guess it maybe the case. However I thought that when I use sort it sort the information according to my liking and arrange it into columns when I pipe it into a new vcf file. So my reasoning was that if the vcf will be sorted properly I would get even samples GT into separate columns. But still it queried with bcftool query into one chunk of information of samples GT. I really appreciate an hits because I am stuck with my main goal to separate samples GT into columns.

ADD REPLY
1
Entering edit mode

bcftools query is your friend when you want a table of comma/tab delimited values from a VCF file. You may also want to look into adding a column command to your pipe so it's easier to eyeball. See this post: How to Use Biostars Part-3: Formatting Text and Using GitHub Gists where I describe how to use column to make things easier to look at.

ADD REPLY
0
Entering edit mode

@Ram thank you for your helpfull comments. I really appreciate it. It really do helped me.

ADD REPLY

Login before adding your answer.

Traffic: 2537 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6