Hi everyone,
I am trying to extract a single chromosome from my dataset so that I can run analyses using only this sex chromosome and compare to results of similar analyses on the entire genome. However, all of my chromosomes are labeled as "Pseudochromosomes", i.e., the chromosome I want to extract and make it's own vcf file is "Pseudochromosome_Z". I read that the following command should do the trick for regularly named chromosomes:
grep -w '^#\|^#CHROM\|^chr[Z]' selas.dp1maxmiss50maf05f_subset_.012.vcf > Z.vcf
However, how do I do this for pseudochromosomes? Thanks for bearing with my ignorance.
grep -v
will work to exclude things specified. So try adding-v
to your command above.Thanks for the reply. Unfortunately, that is not working for me. When I add "-v", the new file is the same size as the original vcf. I want a new vcf file that will contain only the Z chromosome. When I try without "v", the new file is only 10kb, tiny and containing no information besides a chromosome list.
Is your regular expression correct?
I'm not entirely sure. I don't have a vcf file with chromosomes labeled that would fit as "chr". Only the vcf file with Pseudochromosomes, so I can't really test whether the original line works. I got the line from here: How to extract specific chromosome from vcf file
I just tried this: grep -v -w '^#\|^#CHROM\|^chr[Z]' selas.dp1maxmiss50maf05f_subset_.012.vcf > Z.vcf
Any idea what I'm doing wrong? I'm guessing it probably has no idea what "Z" is since I believe it's looking for just a chromosome number and not a pseudochromosome. Thanks!
You have to use the actual names that are in your file in expression. Show the header of your vcf file and a few lines. Someone can correct the expression.
I've supplied what I think is the correct information below. Would you mind helping me correcting my expression? I believe it is something close to the following:
Thanks!