Select a subset of variants from a larger vcf file
1
1
Entering edit mode
6.6 years ago
paraskevopou ▴ 20

Hi all!! I have a large vcf file and I want to create a subset one according to #CHROM field with a txt file (a list that contains #CHROM IDs of interest). I would like to keep the headers and the vcf format. Any ideas of how to do that? Thanks a lot! :)

RNA-Seq snp • 7.0k views
ADD COMMENT
2
0
Entering edit mode

Thanks a lot for the comment. Actually my vcf file contains SNPs called from transcriptomes. So, the #CHROM field contains a bunch of different "genes" around 26000. From these I want to extract according to #CHROM around 5000. This is why I asked if it is possible to be done by providing a list as a txt file with the desirable #CHROM names. This is how my prefixes in the #CHROM field look like. Moreover the headers do not have constant numbers but random.

TRINITY_DN6643_c0_g2
ADD REPLY
1
Entering edit mode

You should be able to use a regions file with bcftools ( gringer's answer in the last link above).

ADD REPLY
0
Entering edit mode

Thanks a lot. the bcftools filter command with the -R <file.txt> option worked perfectly.

ADD REPLY

Login before adding your answer.

Traffic: 2291 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6