Hi all,
I need to use tabix for a project and the following is the syntax that is used, for e.g.
tabix -fhftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20101123/interim_phase1_release/ALL.chr12.phase1.projectConsensus.genotypes.vcf.gz 12:2345295-2345295 > genotypes.vcf
The question I have is if i have a lot of regions in the same chromosome, is there a way that I can supply a file with the regions as input, what would the syntax be? In other words can I just replace the chromosome number and coordinate with a text file name with each region on a new line?
Thanks
Ashwin
i just realized another issue, the vcf files i have are annotated with rs#, whereas the above approach will search using chromosomal coordinates, i wonder is there was a way to search by variant name using the above approach.
you can convert the rs# to coordinates in dbSNP
hi, i just tried to run in this command in a command window in ubuntu 12 and i get a message
[kftpconnectfile] 350 Restarting at 0. Send STORE or RETRIEVE to initiate transfer [main] fail to open the data file.
the command i typed was
xargs -a Chr12.txt -I {} tabix -fh ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20101123/interim_phase1_release/ALL.chr12.phase1.projectConsensus.genotypes.vcf.gz {} >> genotypes.vcf
any ideas?
yes it does, remove that damn < > (I removed them from the response)
hi jc, all of your input has certainly helped, but i had another question. When i run the aboe command i get a header line for each of the variant that was found. is it not possible to have just one header line accross the entire VCF file?
you can remove the header removing the -h flag, but that removes all the headers, to obtain just one header, first generate the output file with a null base: