list of gene or transcript IDs and their length
4
0
Entering edit mode
8.9 years ago
zizigolu ★ 4.3k

Hi,

How I can get a list of gene or transcript IDs and their length in Saccharomyces cerevisiae please?

Thank you

gene sequence • 7.2k views
ADD COMMENT
2
Entering edit mode
8.9 years ago
EagleEye 7.6k

A: Converting gtf format to bed format

If you have GTF file from gencode, above mentioned shell script should work in both gene level and transcript level.

ADD COMMENT
0
Entering edit mode

yes, right, gtf file contains both

ADD REPLY
1
Entering edit mode
8.9 years ago
zizigolu ★ 4.3k

with featurecounts using bam and gtf file we can get the all of gene IDs and the length but i should select coding genes among them

ADD COMMENT
5
Entering edit mode
8.9 years ago
trausch ★ 1.9k

In R using Bioconductor this should work (for ENSEMBL genes):

> library(GenomicAlignments)
> library(TxDb.Scerevisiae.UCSC.sacCer3.ensGene)
> txdb = TxDb.Scerevisiae.UCSC.sacCer3.ensGene
> txlen = transcriptLengths(txdb, with.utr5_len=TRUE, with.utr3_len=TRUE)
> head(txlen)
ADD COMMENT
1
Entering edit mode

You can also use transcripts(txdb) to get all transcript coordinates.

ADD REPLY
0
Entering edit mode

thank you for your answer

ADD REPLY
1
Entering edit mode
8.9 years ago
Prakki Rama ★ 2.7k

From ENSEMBL

ftp://ftp.ensembl.org/pub/release-83/fasta/saccharomyces_cerevisiae/cds/

From NCBI - RefSeq

paste "Saccharomyces cerevisiae"[porgn:__txid4932] in NCBI browser. Select mRNA and RefSeq for list of genes from the left side menu.

Below the search bar, click Send to, Choose destination: File, Format as: Fasta

Sequences length - You can get sequences length using a AWK oneliner:

awk '/^>/ {if (sqlen){print sqlen}; print ;sqlen=0;next; } { sqlen = sqlen +length($0)}END{print sqlen}' file.fa
ADD COMMENT
0
Entering edit mode

thank you very much,

I need transcript or gene IDs and the length and I don't need the fasta sequence of them

something like below I need

genesID geneslength
R0010W  1272
R0020C  1122
R0030W  546
R0040C  891
YAL069W  315

but NCBI-refseq only has 8 IDs

ADD REPLY
1
Entering edit mode

Check this screenshot of what I see in NCBI gene list for Saccharomyces cerevisiae -

Once you download

awk '/^>/ {if (sqlen){print sqlen}; print ;sqlen=0;next; } { sqlen = sqlen +length($0)}END{print sqlen}' filename.fa | paste - -
ADD REPLY
0
Entering edit mode

thank you for paying attention,

I mapped my reads on cdna instead of genome fasta with bowtie2 then I need something like all of coding sequend IDs and their length

what I got with your kindly tip is like below

>gi|891176844|ref|NM_001305015.2| Saccharomyces cerevisiae S288c hypothetical protein partial mRNA
5410
>gi|891176612|ref|NM_001310667.1| Saccharomyces cerevisiae S288c hypothetical protein partial mRNA
333
ADD REPLY

Login before adding your answer.

Traffic: 1997 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6