problem with bcftools syntax
1
0
Entering edit mode
13 months ago
Barista ▴ 10

Hi all!

I am having difficulty with creating a bcftools command. I have a .vcf.gz file downloaded from the 1000G site and a csv file with columns chrom/pos/id/ref/alt. I would like to manipulate the downloaded vcf file so that it uses only the snps I have in my csv file.

To do so I first created a txt file like this:

awk -F';' 'NR > 1 {print $1, $2}' chr.csv > regions.txt

Then I wanted to use bcftools in such a way:

bcftools view -T regions.txt -O z -o filtered_chr.vcf.gz ALL.chr.vcf.gz

But this unfortunately gives me such an error:

[E::bcf_sr_regions_init] Could not parse 1-th line of file
regions.txt, using the columns 1,2[,-1] Failed to read the targets:
regions.txt

What am I doing wrong? Is my bcftools syntax command wrong, or maybe my regions.txt file should be created differentely?

I would really appreciate any help!

bcftools • 1.6k views
ADD COMMENT
2
Entering edit mode
13 months ago

default delimiter of AWK is space but bcftools wants a tab delimited file. Change awk OFS:

'(NR>1) {OFS="\t";print $1,$2;}' 

if that still doesn't work, show us the first lines of the file...

ADD COMMENT
0
Entering edit mode

Pierre Lindenbaum Thank you so much, I think it worked! The new file has been created :)

ADD REPLY
0
Entering edit mode

Pierre Lindenbaum I wanted to confirm that the created file is correct, which is why I decided to unzip it and open in Excel. It looks a bit weird to me, because right after the header section in the vcf file, there are name of the columns and nothing below that line. Moreover, there are the typical vcf column names (#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT) but then, in the same line there are much more cells with the HG.../NA... (please find screenshot attached). What am I doing wrong? enter image description here

ADD REPLY
1
Entering edit mode

It looks like there was no overlap between the selected regions and the VCF. The most common reason is a difference in chromosome names, e.g., "chr1" vs. "1". Check if the chromosome names are compatible between the two files. Also, opening a VCF in Excel is not recommended, use zcat or bcftools view file.vcf.gz | less or just less on the vcf.gz file.

ADD REPLY
0
Entering edit mode

Michael My txt file contains this syntax: "chr1". I googled now and found that the format of naming chromosomes in the 1000G vcf.gz files would be in this case just "1". Thanks for bringing this up! I will try to correct it now :)

EDIT: I can see the data now in the data section, thanks once again guys for the help! One more little question: in the columns ID and QUAL I have dots, what does it mean?

ADD REPLY

Login before adding your answer.

Traffic: 1916 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6