Question

de novo transcriptome assembly with >400K genes. How to proceed?

0

Entering edit mode

4.8 years ago

User 4014 ▴ 40

Hi folks,

I have a de novo transcriptome assembly of a polyploid tree species assembled with K=31 and min length 200 bp. The assembly contains almost 400K genes, and after a reduction with CD-HIT-EST (cut-off=0.97), I have around 350K genes left. Mapping ca. 1/4 of total reads back to the assembly showed the majority of the reads align > 1 times. Do you think would it pose a problem if I aim to work at a gene level? I can try cd-hit-est with cut-off=0.95. Or is it better to use Lace to stitch different isoforms together and take it from there?

Thank you very much in advance for your suggestions and comments!

$ bowtie2 --local --no-unal -x cdhit_e97_Trinity_Famer_K31 -p 24 -q -1 cat_70x_R1.fq.gz -2 cat_70x_R2.fq.gz | samtools view -b | samtools sort -o 70x_bowtie2.bam
78850917 reads; of these:
  78850917 (100.00%) were paired; of these:
    2584872 (3.28%) aligned concordantly 0 times
    11430984 (14.50%) aligned concordantly exactly 1 time
    64835061 (82.22%) aligned concordantly >1 times
    ----
    2584872 pairs aligned concordantly 0 times; of these:
      201798 (7.81%) aligned discordantly 1 time
    ----
    2383074 pairs aligned 0 times concordantly or discordantly; of these:
      4766148 mates make up the pairs; of these:
        627598 (13.17%) aligned 0 times
        410463 (8.61%) aligned exactly 1 time
        3728087 (78.22%) aligned >1 times
99.60% overall alignment rate
[bam_sort_core] merging from 80 files and 1 in-memory blocks...

rna-seq RNA-Seq • 689 views

ADD COMMENT • link updated 4.8 years ago by Biostar 20 • written 4.8 years ago by User 4014 ▴ 40