Hi,
How I can get a list of gene or transcript IDs and their length in Saccharomyces cerevisiae please?
Thank you
Hi,
How I can get a list of gene or transcript IDs and their length in Saccharomyces cerevisiae please?
Thank you
A: Converting gtf format to bed format
If you have GTF file from gencode, above mentioned shell script should work in both gene level and transcript level.
with featurecounts using bam and gtf file we can get the all of gene IDs and the length but i should select coding genes among them
In R using Bioconductor this should work (for ENSEMBL genes):
> library(GenomicAlignments)
> library(TxDb.Scerevisiae.UCSC.sacCer3.ensGene)
> txdb = TxDb.Scerevisiae.UCSC.sacCer3.ensGene
> txlen = transcriptLengths(txdb, with.utr5_len=TRUE, with.utr3_len=TRUE)
> head(txlen)
From ENSEMBL
ftp://ftp.ensembl.org/pub/release-83/fasta/saccharomyces_cerevisiae/cds/
From NCBI - RefSeq
paste "Saccharomyces cerevisiae"[porgn:__txid4932]
in NCBI browser. Select mRNA and RefSeq for list of genes from the left side menu.
Below the search bar, click Send to
, Choose destination: File
, Format as: Fasta
Sequences length - You can get sequences length using a AWK oneliner:
awk '/^>/ {if (sqlen){print sqlen}; print ;sqlen=0;next; } { sqlen = sqlen +length($0)}END{print sqlen}' file.fa
thank you for paying attention,
I mapped my reads on cdna instead of genome fasta with bowtie2 then I need something like all of coding sequend IDs and their length
what I got with your kindly tip is like below
>gi|891176844|ref|NM_001305015.2| Saccharomyces cerevisiae S288c hypothetical protein partial mRNA
5410
>gi|891176612|ref|NM_001310667.1| Saccharomyces cerevisiae S288c hypothetical protein partial mRNA
333
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
yes, right, gtf file contains both