I am working on rna-seq data for a host-pathogen interaction between a grass species and its fungal parasite. The ultimate goal is to do differential expression analysis and functional enrichment to see what genes and pathways are involved in parasitism.
I have:
- Draft genome of the fungus
- RNA-seq reads from non-infected grass
- RNA-seq reads from infected grass (contains grass and fungal transcripts)
- RNA-seq reads from the fungus growing in culture
I built the transcriptome of the fungus using just the reads from the culture grown fungus, and I also built the grass transcriptome with only the non-infected reads. Now im thinking it would be useful to rebuild those trascriptomes to include reads from the infected tissue to capture transcripts that are unique to the host-pathogen interaction.
Is there a way to filter the infected reads into grass and fungal groups using the resources I currently have?
Perhaps I could align the infected grass reads (#3) to the fungal transcriptome, and use only the un-mapped reads to rebuild the grass transcriptome? Maybe I can use BLAST, BBduk, or some other tool on the unmapped reads to further filter out fungal reads before using them to build the grass transcriptome.
valid approach indeed. I could consider aligning them to the fungal genome (as well?) in order to filter out the fungal ones.
Hey lieven.sterck,
Thanks for the response! Ive considered using BBsplit to further sort, but unfortunately I dont have genomic sequence of the plant.
Does anyone know a tool that can sort RNA-seq data using the genome of one of the host-pathogen species?
Can't you just align them to the fungal genome and then use the ones that do not map (== likely to be plant ones) ?
That would be the way to go.