I have a vcf file. When I run this command grep -v -E '^#' variants.vcf | cut -f 1 | sort | uniq -c
I got 6835 line. But my chromosome number is not greater than 31. Is there any way to specify 31 chromosome form this 6835 line. The lines are look like:
NC_005044.2
NC_030808.1
NC_030809.1
NC_030810.1
NC_030811.1
NC_030812.1
NC_030813.1
Thank you.
What does this exactly mean? Are you concerned that there are > 31 entries that look like chromosomes?
yes, How can I differentiate them. There are two tags "NC" and "NW" . What does "NW" mean.
NC
are fully assembled chromosomes.NW
are scaffolds/rcontigs that would still be part of the genome. They would haveNNNN
where there is missing sequence. You can find a full listing here.