Plant small RNA (sRNA) data analysis pipeline
1
0
Entering edit mode
5.7 years ago
K S • 0

I am interested in the identification of the 'Viromes' in the wheat samples infected with viruses. For this purpose, I am not sure about the pipeline to be used. I have sRNA data from illumina and I am following these steps

  1. Quality check of the reads

a. Raw reads -> Trim adapters and filtered reads (FASTQC, cutadapt and Trimmomatic)

  1. Mapping on the host genome to find host-specific reads

a. building the indexes from the whole wheat genome (bowtie2, GMAP) (getting an error due to the size of the genome)

b. Mapping of reads to the reference genome (Tophat, SAMTOOLS)

*. Would it be better to align them to RNA sequences from wheat instead of the whole genome?

  1. De-no assembly of the unmapped reads (velvet, kmer - 17)

  2. Mapping of contigs to the reference genome from step 2 (bowtie2, tophat, samtools)

  3. BLASTN Unmapped contigs against virus databases in the NCBI/Genebank

  4. BLASTX against virus protein database.

Thanks

sequencing next-gen virome • 1.4k views
ADD COMMENT
0
Entering edit mode
5.7 years ago
Fabio Marroni ★ 3.0k

Your pipeline is reasonable. I don't know how to help regarding the indexing problem; I understand that genomes larger than the human are very hardly managed by common index building tools. You might ask to someone working on wheat, I have no experience on that. As a shortcut you might map on the transcriptome. Another option would be to use some metagenomic classifier (e.g. kraken2) to remove all reads mapping to plants (you will have to use the nt database). However, I would also suggest giving a look at VirusDetect. Not sure if it can handle wheat!

ADD COMMENT

Login before adding your answer.

Traffic: 1634 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6