Question

list of gene or transcript IDs and their length

0

Entering edit mode

9.0 years ago

zizigolu ★ 4.3k

Hi,

How I can get a list of gene or transcript IDs and their length in Saccharomyces cerevisiae please?

Thank you

gene sequence • 7.3k views

ADD COMMENT • link updated 2.5 years ago by Ram 44k • written 9.0 years ago by zizigolu ★ 4.3k

5

Entering edit mode

9.0 years ago

trausch ★ 1.9k

In R using Bioconductor this should work (for ENSEMBL genes):

> library(GenomicAlignments)
> library(TxDb.Scerevisiae.UCSC.sacCer3.ensGene)
> txdb = TxDb.Scerevisiae.UCSC.sacCer3.ensGene
> txlen = transcriptLengths(txdb, with.utr5_len=TRUE, with.utr3_len=TRUE)
> head(txlen)

ADD COMMENT • link updated 5.0 years ago by Ram 44k • written 9.0 years ago by trausch ★ 1.9k

1

Entering edit mode

You can also use transcripts(txdb) to get all transcript coordinates.

ADD REPLY • link 9.0 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

thank you for your answer

ADD REPLY • link 9.0 years ago by zizigolu ★ 4.3k

1

Entering edit mode

9.0 years ago

Prakki Rama ★ 2.7k

From ENSEMBL

ftp://ftp.ensembl.org/pub/release-83/fasta/saccharomyces_cerevisiae/cds/

From NCBI - RefSeq

paste "Saccharomyces cerevisiae"[porgn:__txid4932] in NCBI browser. Select mRNA and RefSeq for list of genes from the left side menu.

Below the search bar, click Send to, Choose destination: File, Format as: Fasta

Sequences length - You can get sequences length using a AWK oneliner:

awk '/^>/ {if (sqlen){print sqlen}; print ;sqlen=0;next; } { sqlen = sqlen +length($0)}END{print sqlen}' file.fa

ADD COMMENT • link updated 5.0 years ago by Ram 44k • written 9.0 years ago by Prakki Rama ★ 2.7k

0

Entering edit mode

thank you very much,

I need transcript or gene IDs and the length and I don't need the fasta sequence of them

something like below I need

genesID geneslength
R0010W  1272
R0020C  1122
R0030W  546
R0040C  891
YAL069W  315

but NCBI-refseq only has 8 IDs

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 9.0 years ago by zizigolu ★ 4.3k

1

Entering edit mode

Check this screenshot of what I see in NCBI gene list for Saccharomyces cerevisiae -

Once you download

awk '/^>/ {if (sqlen){print sqlen}; print ;sqlen=0;next; } { sqlen = sqlen +length($0)}END{print sqlen}' filename.fa | paste - -

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 9.0 years ago by Prakki Rama ★ 2.7k

0

Entering edit mode

thank you for paying attention,

I mapped my reads on cdna instead of genome fasta with bowtie2 then I need something like all of coding sequend IDs and their length

what I got with your kindly tip is like below

>gi|891176844|ref|NM_001305015.2| Saccharomyces cerevisiae S288c hypothetical protein partial mRNA
5410
>gi|891176612|ref|NM_001310667.1| Saccharomyces cerevisiae S288c hypothetical protein partial mRNA
333

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 9.0 years ago by zizigolu ★ 4.3k

Ram · Accepted Answer · 2015-12-29

2

Entering edit mode

9.0 years ago

EagleEye 7.6k

A: Converting gtf format to bed format

If you have GTF file from gencode, above mentioned shell script should work in both gene level and transcript level.

ADD COMMENT • link updated 5.0 years ago by Ram 44k • written 9.0 years ago by EagleEye 7.6k

0

Entering edit mode

yes, right, gtf file contains both

ADD REPLY • link 9.0 years ago by zizigolu ★ 4.3k

Ram · Accepted Answer · 2015-12-29

1

Entering edit mode

9.0 years ago

zizigolu ★ 4.3k

with featurecounts using bam and gtf file we can get the all of gene IDs and the length but i should select coding genes among them

ADD COMMENT • link updated 5.0 years ago by Ram 44k • written 9.0 years ago by zizigolu ★ 4.3k