Is there any way to download knownCanonical set from the NCBI Refseq track as it is possible for the UCSC track ?
1
0
Entering edit mode
4.9 years ago

From the UCSC genome browser, is there any way to download knownCanonical set from the NCBI Refseq track as it is possible for the UCSC track (see screenshot below)

ucsc

knownCanonical is not available in the dropdown in table option when we select NCBI Refseq

ncbi2

ucsc canonical refseq table browser hg19 • 3.2k views
ADD COMMENT
0
Entering edit mode

knownCanonical is a UCSC term so it won't be available for RefSeq. Specifically, what is it that you are looking for from NCBI RefSeq? Are you only interested in the 'Known RefSeqs' (aka RefSeqs with the NM/NR prefix)?

ADD REPLY
0
Entering edit mode

I am trying to run DepthOfCoverage from GATK3 (it's an old - no more supported version) which requires RefSeq file , however, that files contains all transcripts and not just canonical transcript. I was wondering how can I generate that file.

ADD REPLY
0
Entering edit mode

In that case, RefSeq Select is your best option. Note, RefSeq Select is only available for protein-coding loci; so of the ~54k unique GeneIDs annotated currently, 19k are protein-coding and have a RefSeq Select. Are you interested in getting these data in GFF3 format? If so, you can either filter the latest RefSeq GFF3 or download GFF3 for just the RefSeq Select transcripts from the NCBI Nucleotide portal. Go to NCBI Nucleotide and search for the term RefSeq_Select[Filter]; then use the 'Send To' link at the top right corner to download 'File' in 'GFF3' format. The latter approach returns a GFF3 file that does not include all of the information normally included in the GFF3 files on FTP but that may be sufficient for your needs.

ADD REPLY
0
Entering edit mode

You can probably use MANE instead. There is also RefSeq RNA fasta file for GRCh38 available from UCSC.

ADD REPLY
0
Entering edit mode

MANE is still in progress. There are still several genes that are not part of MANE. For example, only protein-coding genes are currently in the scope of MANE and that too, not all protein-coding genes are in MANE yet. And MANE picks one representative transcript for every gene. So, alternate splice variants that use a different promoter they are not included in the MANE set. If splice variants are important for your downstream analyses, MANE may not be the best choice. However, if you are interested in just one representative transcript for each gene, RefSeq Select may be a better choice for you. Only protein-coding genes are in scope for RefSeq Select as well but at least all genes have a RefSeq Select and MANE is a subset of RefSeq Select.

ADD REPLY
1
Entering edit mode
4.9 years ago
jnavarr5 ▴ 10

Hello,

We are happy to let you know that the RefSeq Select dataset is now available on the development server, https://genome-preview.soe.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&c=chrX&g=refSeqComposite, for both hg38 and hg19.

Please note the message about how data and tools on our genome-test server are under development, have not been reviewed for quality, and are subject to change at any time. Unfortunately, it is not clear when we will do a quality check and release the RefSeq Select track to the public site. If you would like email updates about the UCSC Genome Browser, please subscribe to our Announcements List:

  • Subscribe: Email genome-announce+subscribe@soe.ucsc.edu
  • Unsubscribe: Email genome-announce+unsubscribe@soe.ucsc.edu
ADD COMMENT

Login before adding your answer.

Traffic: 1679 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6