How to extract all the IDs of VCF file for GWAS analyses?
1
Hi,
How can I extract the genotyped ID out of the VCF file?
I need to select a subset of samples for GWAS analyses.
Thanks!
VCF
GWAS
ID
sample
• 1.6k views
Take a look:
bcftools view -h Merge.UnFiltered.TumorOnly.vcf.gz | tail -1 ;
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 1174-N 1174-T 1292_CD19-T 1292_noCD19-N 1299-N 1299-T 1347-N 1347-T1378-N 1378-T 1584-N 1584-T 1696-N 1696-T 1700-N 1700-T 1823-N 1823-T 2111-N 2111-T 2217-N 2217-T 2297_CD19-T 2297_noCD19-N 2327_CD19-T 2327_noCD19-N 2343-N 2343-T 2474-N 2474-T 2514-N 2514-T 2530-N 2530-T 2584-N 2584-T 2596_CD19-T 2596_noCD19-N 2623-N 2623-T2822-N 2822-T 2896-N 2896-T 2925-N 2925-T 2946_CD19-T 2946_noCD19-N 2983-N 2983-T 2994-N 2994-T 3034_CD19-T 3034_noCD19-N 3053-N3053-T 3068-N 3068-T 3112-N 3112-T 3146_CD19-T 3146_noCD19-N 3239-N 3239-T 3252-N 3252-T 3396-N 3396-T 3453-N 3453-T 3498-N 3498-T3505-N 3505-T 3520_CD19-T 3520_noCD19-N 3540-N 3540-T 3583-N 3583-T 3592-N 3592-T 3597-N 3597-T 3764-N 3764-T 3819-N 3819-T 3833-N3833-T 3926-N 3926-T 3929-N 3929-T 3953-N 3953-T 4000_CD19-T 4000_noCD19-N 4054-N 4054-T 4074-N 4074-T 4090_CD19-T 4090_noCD19-N4119-N 4119-T 4152-N 4152-T
..or:
bcftools view -h Merge.UnFiltered.TumorOnly.vcf.gz | tail -1 |\
awk -F "\t" '{for (i=1; i<=NF; i++) {if (i>9) {print $(i)}}}' ;
1174-N
1174-T
1292_CD19-T
1292_noCD19-N
1299-N
1299-T
1347-N
1347-T
1378-N
1378-T
1584-N
1584-T
1696-N
1696-T
1700-N
1700-T
1823-N
1823-T
2111-N
2111-T
...
...or:
bcftools view -h Merge.UnFiltered.TumorOnly.vcf.gz | tail -1 |\
sed 's/\t/\n/g' | sed '1,9d' ;
1174-N
1174-T
1292_CD19-T
1292_noCD19-N
1299-N
1299-T
1347-N
1347-T
1378-N
1378-T
1584-N
1584-T
1696-N
1696-T
1700-N
1700-T
1823-N
1823-T
2111-N
2111-T
...
Kevin
Login before adding your answer.
Traffic: 1292 users visited in the last hour
How to extract info of samples of interest from VCF file?
Hi Kevin,
The problem is slightly different. Here I only want the list IDs, not VCF file containing information of a subset of IDs.
Some simple way to extract all IDs? Thanks!
what do you mean with ID ? ID is the 3rd column of the VCF.