Hello Community.
My problem is the following, I have some bed files whose genomic regions are annotated using the chromosome (chr__ start end ... ...), and I want to use the ncbi gff3 to extract the info but this file is annotated using accession.version numbers. Bedtools oblige me to use the same location nomencaluture thus I need to transform the accession to chr base.
So far I know that the number of the "NC_" prefixed accessions id specify the chromosme, (i.e: NC_000001.11: chr1, NC_000002.12: chr2, ..., NC_000023.11: chrX, NC_000024.10:chrY, NC_012920.1: chrM ). Nevertheless, how can I know which is the chromosome of the accessions prefixed with NW_ or NT_?
Some "NT_ , NW_" are alternative assemblies of NC_ and the info contained is "the same" being placed lines below that NC_, but some others do not and contains genes of interest which I could be loosing when using bedtools i.e https://www.ncbi.nlm.nih.gov/gene/3806. Some do not have a known location but that gene is known to be in the chromosome 19 and I can not deduce it from its accession number.
Is there a way of getting the chromosome from the accession number? Or shall I extract the info from another annotation file?
Thanks
Have you tried potential way(s) of linking chromosomes to accession number mentioned in this post: How to get the chromosome numbers from RefSeq accession IDs ?
I saw it but all the links provided there are not working and the answer with awk + sed only applies with NC_ (already under control). Thanks anyway
you may want to give some example data and expected output.
Well that is already given in the the question, with the Entrez ID gene 3806, which is annotated in the accession NT_113949 and I want to obtain the chromosome which is number 19. I could look for more examples but the idea is basically that, from an accession number prefixed with NT_ NW_ obtain its chromose if it is known.
http://gtamazian.blogspot.com/2013/08/converting-chromosome-accession-numbers.html
ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/Assembled_chromosomes/