Entering edit mode
5.7 years ago
umeshtanwar2
▴
30
Hi all
I am working on the RNAseq data of Arabidopsis thaliana. I have done the differential gene expression analysis by using DESeq2. Now I have a list of DE genes (gene ID) in csv file. How can I have the names of the genes in this file? My file looks like this:
baseMean log2FoldChange lfcSE stat pvalue padj
gene:AT5G65080 122.875394083372 1.69474723920958 0.145231051625416 11.7004967984981 1.26709498309877E-31 5.85423224091292E-28
gene:AT1G64380 168.896650212747 1.30834505764162 0.132353055400632 9.78909978664265 1.25407410840269E-22 2.41419716485088E-19
gene:AT5G65070 82.6936482493549 1.26016444752325 0.14553940276928 8.66897373013599 4.36032539676582E-18 3.59742417823883E-15
Any guidance from you will be very helpful
Thank you
I presume you mean that you want to convert your current IDs (for example, AT5G65080) to gene symbols? From where did you get the data in the first place? If a GTF was used in the original count abundance step (prior to DESeq2), then the corresponding gene symbols may be in that file.
If you literally just want to read the file back into R, then use
read.csv()
Thank you Kevin. I used the STAR for alignment of reads on reference with using annotations in gff3 file format. Then I used featureCounts for prior to DESeq2. I converted annotation file gff3 to gtf for using in featureCounts. Did it make the difference?
From where did you obtain that gff3 file? If you look inside the file, you may see a field for gene symbol.
I obtained the gff3 file from:
Arabidopsis release 42
Please suggest me if it is correct to convert the gff3 to gtf for featureCounts?
In the GFF3 file, gene symbol is used with the
name
tag, where available.Yes, why not? - it is okay to convert GFF3 to GTF. If you want gene symbols, specify
GTF.attrType='name'
when using featureCounts in R, or-g name
when using featureCounts in a linux / cluster environment.Thank you so much @Kevin. I will do this when using featureCounts.
Okay, but not all genes appear to have a gene name. I am not too familiar with A. thaliana annotations. Best of luck.
I am facing this problem:
//================================= Running ==================================\ || || || Load annotation file oldArabidopsis_thaliana.TAIR10.42.gtf ... ||
Warning: failed to find the gene identifier attribute in the 9th column of the provided GTF file. The specified gene identifier attribute is 'gene_name' The attributes included in your GTF annotation are 'transcript_id "transcript:AT1G03987.1"; gene_id "gene:AT1G03987";'
Can you post a few lines from your GTF file? It sounds like there is no attribute called "gene_name" in the file, so you might need a different attribute.
My GTF file looks like this.
Which extra information do you want, exactly? Your gene IDs are already the official symbols for A. thaliana. Please help us by being as specific as you can be.
Please follow up on this thread rather than opening new ones. If you have trouble, explain what the problem is and put in some effort to show that you indeed try to work on the issue.