Differential gene expression DESeq2
0
0
Entering edit mode
5.7 years ago
umeshtanwar2 ▴ 30

Hi all

I am working on the RNAseq data of Arabidopsis thaliana. I have done the differential gene expression analysis by using DESeq2. Now I have a list of DE genes (gene ID) in csv file. How can I have the names of the genes in this file? My file looks like this:

baseMean    log2FoldChange  lfcSE   stat    pvalue  padj
gene:AT5G65080  122.875394083372    1.69474723920958    0.145231051625416   11.7004967984981    1.26709498309877E-31    5.85423224091292E-28
gene:AT1G64380  168.896650212747    1.30834505764162    0.132353055400632   9.78909978664265    1.25407410840269E-22    2.41419716485088E-19
gene:AT5G65070  82.6936482493549    1.26016444752325    0.14553940276928    8.66897373013599    4.36032539676582E-18    3.59742417823883E-15

Any guidance from you will be very helpful

Thank you

RNA-Seq • 2.3k views
ADD COMMENT
1
Entering edit mode

I presume you mean that you want to convert your current IDs (for example, AT5G65080) to gene symbols? From where did you get the data in the first place? If a GTF was used in the original count abundance step (prior to DESeq2), then the corresponding gene symbols may be in that file.

If you literally just want to read the file back into R, then use read.csv()

ADD REPLY
0
Entering edit mode

Thank you Kevin. I used the STAR for alignment of reads on reference with using annotations in gff3 file format. Then I used featureCounts for prior to DESeq2. I converted annotation file gff3 to gtf for using in featureCounts. Did it make the difference?

ADD REPLY
1
Entering edit mode

From where did you obtain that gff3 file? If you look inside the file, you may see a field for gene symbol.

ADD REPLY
0
Entering edit mode

I obtained the gff3 file from:

Arabidopsis release 42

Please suggest me if it is correct to convert the gff3 to gtf for featureCounts?

ADD REPLY
1
Entering edit mode

In the GFF3 file, gene symbol is used with the name tag, where available.

Yes, why not? - it is okay to convert GFF3 to GTF. If you want gene symbols, specify GTF.attrType='name' when using featureCounts in R, or -g name when using featureCounts in a linux / cluster environment.

ADD REPLY
0
Entering edit mode

Thank you so much @Kevin. I will do this when using featureCounts.

ADD REPLY
1
Entering edit mode

Okay, but not all genes appear to have a gene name. I am not too familiar with A. thaliana annotations. Best of luck.

ADD REPLY
0
Entering edit mode

I am facing this problem:

//================================= Running ==================================\ || || || Load annotation file oldArabidopsis_thaliana.TAIR10.42.gtf ... ||

Warning: failed to find the gene identifier attribute in the 9th column of the provided GTF file. The specified gene identifier attribute is 'gene_name' The attributes included in your GTF annotation are 'transcript_id "transcript:AT1G03987.1"; gene_id "gene:AT1G03987";'

ADD REPLY
1
Entering edit mode

Can you post a few lines from your GTF file? It sounds like there is no attribute called "gene_name" in the file, so you might need a different attribute.

ADD REPLY
0
Entering edit mode
1   araport11   exon    3631    3913    .   +   .   transcript_id "transcript:AT1G01010.1"; gene_id "gene:AT1G01010"; gene_name "NAC001"; Name "AT1G01010.1.exon1"; constitutive "1"; ensembl_end_phase "1"; ensembl_phase "-1"; rank "1"; biotype "protein_coding"; transcript_id "AT1G01010.1"; protein_id "AT1G01010.1";
1   araport11   exon    3996    4276    .   +   .   transcript_id "transcript:AT1G01010.1"; gene_id "gene:AT1G01010"; gene_name "NAC001"; Name "AT1G01010.1.exon2"; constitutive "1"; ensembl_end_phase "0"; ensembl_phase "1"; rank "2"; biotype "protein_coding"; transcript_id "AT1G01010.1"; protein_id "AT1G01010.1";
  1 araport11   exon    4706    5095    .   +   .   transcript_id "transcript:AT1G01010.1"; gene_id "gene:AT1G01010"; gene_name "NAC001"; Name "AT1G01010.1.exon4"; constitutive "1"; ensembl_end_phase "0"; ensembl_phase "0"; rank "4"; biotype "protein_coding"; transcript_id "AT1G01010.1"; protein_id "AT1G01010.1";
1   araport11   exon    4706    5095    .   +   .   transcript_id "transcript:AT1G01010.1"; gene_id "gene:AT1G01010"; gene_name "NAC001"; Name "AT1G01010.1.exon4"; constitutive "1"; ensembl_end_phase "0"; ensembl_phase "0"; rank "4"; biotype "protein_coding"; transcript_id "AT1G01010.1"; protein_id "AT1G01010.1";
1   araport11   exon    5174    5326    .   +   .   transcript_id "transcript:AT1G01010.1"; gene_id "gene:AT1G01010"; gene_name "NAC001"; Name "AT1G01010.1.exon5"; constitutive "1"; ensembl_end_phase "0"; ensembl_phase "0"; rank "5"; biotype "protein_coding"; transcript_id "AT1G01010.1"; protein_id "AT1G01010.1";

My GTF file looks like this.

ADD REPLY
0
Entering edit mode

Which extra information do you want, exactly? Your gene IDs are already the official symbols for A. thaliana. Please help us by being as specific as you can be.

ADD REPLY
0
Entering edit mode

Please follow up on this thread rather than opening new ones. If you have trouble, explain what the problem is and put in some effort to show that you indeed try to work on the issue.

ADD REPLY

Login before adding your answer.

Traffic: 2046 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6