IsoformSwitchAnalyserR lmportR data warning.
1
0
Entering edit mode
17 months ago
Priyanka ▴ 10

I am trying to run IsoformSwitchAnalyzeR using R version 4.3.0 using following command:

aSwitchList2 <- importRdata(
    isoformCountMatrix   = salmonQuant$counts,
    isoformRepExpression = salmonQuant$abundance,
    designMatrix         = myDesign,
    isoformExonAnnoation = "gencode.v40.annotation.gtf.gz",
    isoformNtFasta       = "gencode.v40.transcripts.fa",
    addAnnotatedORFs     = TRUE,
    showProgress         = TRUE,
    ignoreAfterBar       = FALSE)

This is the Output :

Step 1 of 10: Checking data...   
Step 2 of 10: Obtaining annotation...
       importing GTF (this may take a while)...
       91856 ( 37.62%) isoforms were removed since they were not expressed in any samples.   
Step 3 of 10: Fixing StringTie gene annoation problems...
       There were no need to rescue any annotation
       34197 genes_id were assigned their original gene_id instead of the StringTie gene_id.
           This was only done when it could be done unambiguous.   
Step 4 of 10: Calculating expression estimates from count data...       Skipped as user supplied expression via the "isoformRepExpression" argument...  
Step 5 of 10: Testing for unwanted effects...       No unwanted effects added  
Step 6 of 10: Batch correcting expression estimates...       Skipped as no batch effects were found or annoated...   
Step 7 of 10: Extracting data from each condition...     
|==================================================================================================| 100%   
Step 8 of 10: Making comparisons...     
|==================================================================================================| 100%  
Step 9 of 10: Making switchAnalyzeRlist object...  
Step 10 of  10: Guestimating differential usage...
       The GUESSTIMATED number of genes with differential isoform usage are:     
comparison estimated_genes_with_dtu 1 Low vs High     0 - 0 
Done

Warning messages: 1: In importRdata(isoformCountMatrix = salmonQuant$counts, isoformRepExpression = salmonQuant$abundance,  : The annotation and quantification (count/abundance matrix and isoform annotation) Seem to be slightly different.    
Specifically:  800  isoforms were only found in the annotation Please make sure this is on purpouse since differences will cause inaccurate quantification and thereby skew all analysis.  If you have quantified with Salmon this could be normal since it as default only keep one copy of  identical sequnces (can be prevented using the --keepDuplicates   option)  We strongly encurage you to go back and figure out why this is the case.

2: In createSwitchAnalyzeRlist(isoformFeatures = isoAnnot, exons = isoformExonStructure,  :   The gene_ids or isoform_ids were not unique we identified multiple instances of the same gene_id/isoform_id on different chromosomes. To solve this we removed 8 gene_id. Please note there might still be duplicated gene_id located on the same  chromosome. Some of these could be due to fusion transcripts which  IsoformSwitchAnalyzeR cannot handle.

The warning is suggesting that only 800 isoforms are present in the annotation, however, I have checked the gtf as well as the fasta file individually and all the IDs from salmonQuants are present in them. I know various places have discussed similar thing and I have tried most of it but still getting this warning message.

I would like some help in understanding and handling this warning message. I am using the updated IsoformSwitchAnalyzeR_2.1.2.

R IsoformSwitchAnalyzeR • 1.3k views
ADD COMMENT
0
Entering edit mode

Can you provide the first 5 lines of your annotation file?

ADD REPLY
0
Entering edit mode

Dear Trivas, this is the first 6 lines of my annotation file that I am using.

##description: evidence-based annotation of the human genome (GRCh38), version 40 (Ensembl 106)                             
##provider: GENCODE                             
##contact: gencode-help@ebi.ac.uk                               
##format: gtf                               
##date: 2022-01-20                              
chr1    HAVANA  gene    11869   14409   .   +   .   gene_id "ENSG00000223972.5"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; level 2; hgnc_id "HGNC:37102"; havana_gene "OTTHUMG00000000961.2";
ADD REPLY
0
Entering edit mode
17 months ago

If you have quantified with Salmon this could be normal since it, as default Salmon only keep one copy of identical sequences (can be prevented using the --keepDuplicates option)

ADD COMMENT
0
Entering edit mode

I was hoping that I would not have to re run Salmon but now I think I will have to.

Thank you for the help

ADD REPLY
0
Entering edit mode

I don't think you need to (and I wrote IsoformSwitchAnalyzeR). Dropping one of two identical isoforms seem like a good quantification approach :-)

ADD REPLY
0
Entering edit mode

I was happy to receive direct assistance from you.

What do you recommend for the warning?

Does it hamper my downstream analysis?

ADD REPLY

Login before adding your answer.

Traffic: 1578 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6