Question

Why are number of CDS smaller than corresponding genes [M. tb]

0

Entering edit mode

2.6 years ago

Alewa ▴ 170

seems weird. could someone please help me understand why? and how to possibly resolve this.

[sta55@cbsukim tb_genes_fasta]$ esearch -db nuccore -query 'Mycobacterium tuberculosis H37Rv[Organism] AND NC_000962.3[ACCN]' | efilter -feature gene | efetch -format gene_fasta | grep "^>" | wc -l
4008
[sta55@cbsukim tb_genes_fasta]$ esearch -db nuccore -query 'Mycobacterium tuberculosis H37Rv[Organism] AND NC_000962.3[ACCN]' | efilter -feature gene | efetch -format fasta_cds_aa | grep "^>" | wc -l
3906

Background

I'm extracting the nucleotide sequence of M. TB genes and their corresponding cds(protein) sequences. https://www.ncbi.nlm.nih.gov/nuccore/NC_000962#locus_448814763

NCBI entrez bash genes • 934 views

ADD COMMENT • link updated 2.6 years ago by Joe 21k • written 2.6 years ago by Alewa ▴ 170

3

Entering edit mode

At a guess one explanation could be that some features annotated as genes would be RNAs etc which aren't coding for proteins, thus there are more genes than CDSs (i.e. more functionally annotated "things" than just proteins")

ADD REPLY • link 2.6 years ago by Joe 21k

0

Entering edit mode

Joe - thanks for chiming in. but in my case there were less cds than the genes. or maybe I'm not doing the gene filtering right? :(

ADD REPLY • link 2.6 years ago by Alewa ▴ 170

1

Entering edit mode

That's what I said, no? You have fewer annotated CDSs than genes. Remember what "CDS" actually means: coding sequences.

This is usually taken to mean they give rise to a functional protein, but the definition of a gene is broader these days and can include non-coding RNAs.

Hence number of CDS + number of non-CDS functional elements = number of "genes".

Or more simply: gene != CDS.

This is still a guess on my part as it could be due to any number of annotation artefacts etc, but I don't see an obvious problem here - the numbers you've retrieved make intuitive sense.

ADD REPLY • link 2.6 years ago by Joe 21k