PlantTFDB transcription factor discovery.
1
0
Entering edit mode
7.2 years ago
jvire1 ▴ 10

Hi all,

I have assembled three transcriptomes of a non-model plant and have been writing up a report. Initially, I blastx and blastp [E-value 1e-5] queried the unigenes and coded for proteins against the entire collection of PlantTFDB protein sequences.

Upon analyzing the unigene blastx and blastp hits I came to the realization that I was getting way too many members of each of the 58 transcription factor families. For example ~3000 unigenes were annotated to bHLH for one of my assemblies, however according to the PlantTFDB species summary (http://planttfdb.cbi.pku.edu.cn/family.php?fam=bHLH) for this family the highest number of bHLH genes identified in one species was 559 (Panicum virgatum).

I have since then filtered the blastx and blastp results at an E-value of 1e-50 (as 1e-5 in hindsight was way too low) and >35% ID. This reduced the number of bHLH annotated unigenes to ~1000, but I suspect this is still too high of an estimate.

I have also been able to generate percent hit coverage stats for the blast results and was thinking that I could similarly filter the results to include hits above some percent hit coverage threshold.

Any suggestions on an alternative approach or a percent hit coverage threshold to filter with would be much appreciated.

RNA-Seq • 3.0k views
ADD COMMENT
0
Entering edit mode
7.2 years ago
jvire1 ▴ 10

Having slept on it I realized a solution would be to take the transcripts with blastp and blastx hits and analyze them with the PlantTFDB prediction server, which uses a much more robust method to determine homology and has limits on the size of uploaded sequences.

ADD COMMENT

Login before adding your answer.

Traffic: 1641 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6