Hi all !!!
I am working on arabidopsis thaliana rna seq data. i need to analysis allele specific expression of it. can any body help me to select tool for snp calling and analysis (Allele specific expression).
Thanks in advance
Hi all !!!
I am working on arabidopsis thaliana rna seq data. i need to analysis allele specific expression of it. can any body help me to select tool for snp calling and analysis (Allele specific expression).
Thanks in advance
AlleleSeq and Allim are few tools that calculate allele-specificity but they are not straightforward to use. Also, I am not sure if you can use them for Arabidopsis genome. I use my own code to calculate allelic ratio.
1) Align reads from F1 hybrid reads to reference genome by allowing enough mismatches so that reads from reference and non-reference subjects get aligned without any mapping bias. 2) Once you have the BAM file, you can use mpileup function of samtools to get depth of different alleles for a given SNP between parents.
This post has the code by Rm for doing the same. A: Allele depth (~ DP4) values for all possible alleles reported?
Though this is not the most sophisticated approach but it should work.
For SNP calling, I would look at this recent paper: Reliable Identification of Genomic Variants from RNA-Seq Data. I haven't tried their method myself but it's one of surprisingly few papers that explicitly addresses the RNA-seq case.
Hi Sara ASE is a complex task, the pipeline I developed and have being following is this (considering you have both the parental transcriptomes as well as the F1 hybrid transcriptomes:
1) Align parental transcriptomes against the Reference genome (STAR)
2) Call homozygous SNPs with depth > 4 reads, qual > 30 (best practices from GATK's pipeline)
3) Build 2 pseudo-references with the parental SNPs, which basically consists of imputing the homozygous SNPs into the reference genome (FastaAlteranteReferenceMaker)
4) Align the F1 hybrid library agains the 2 pseudo-references in two independent jobs
5) Retrieve the heterozygous, biallelic, >20 reads depth, qual >30, etc (filters)
6) merge by position to identify SNPs in common that are also reciprocal (AT-TA) for reference and alternate alleles.
7) Annotate the SNPs (map to the genomic feature) to identify from which transcrpt the reads that contain the SNPs were produced (SnpEff-SNPSift)
8)Summarise the allelic depth of all the SNPs that map to the same gene ID, transcript ID,etc in roder to increase the total reads that bear the parental-specific alleles
9) Use binomial test (x-square,etc) to asses for allelic imbalance under the null hypothesis of allelic balance in heterozygous)
If you want to asses for trans effects you just have to comapre the ratios of expression of your 2 divergent transcriptomes (parental: you can retrieve the expression of the orthologs with kallisto,bowtie, etc) against the ratio of the two alleles in the hybrids with an Fisher exact test: Parental (P1=40/P2=400)(/HP1=60/HP2=55)
If you have replicates I would go for a likelihood ratio test within the DESeq2 package, which also will perform the library size normalization by means of scaling factors.
Best,
Erik
There were no replies in my post, so replying here. Read this chapter. It could be helpful. This works for mouse cell lines.
Experimental Analysis of Imprinted Mouse X-Chromosome Inactivation
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Just to note, it does seem that's a good tool for RNA-Seq SNP calling in general, but not related to allele-specific expression. It actually says a SNP will not be detected for mono-allelic expression.
Yes, I agree. I was just trying to address the first part of "tool for snp calling and analysis (Allele specific expression)."