de novo transcriptome assembly with >400K genes. How to proceed?
0
0
Entering edit mode
4.8 years ago
User 4014 ▴ 40

Hi folks,

I have a de novo transcriptome assembly of a polyploid tree species assembled with K=31 and min length 200 bp. The assembly contains almost 400K genes, and after a reduction with CD-HIT-EST (cut-off=0.97), I have around 350K genes left. Mapping ca. 1/4 of total reads back to the assembly showed the majority of the reads align > 1 times. Do you think would it pose a problem if I aim to work at a gene level? I can try cd-hit-est with cut-off=0.95. Or is it better to use Lace to stitch different isoforms together and take it from there?

Thank you very much in advance for your suggestions and comments!

$ bowtie2 --local --no-unal -x cdhit_e97_Trinity_Famer_K31 -p 24 -q -1 cat_70x_R1.fq.gz -2 cat_70x_R2.fq.gz | samtools view -b | samtools sort -o 70x_bowtie2.bam
78850917 reads; of these:
  78850917 (100.00%) were paired; of these:
    2584872 (3.28%) aligned concordantly 0 times
    11430984 (14.50%) aligned concordantly exactly 1 time
    64835061 (82.22%) aligned concordantly >1 times
    ----
    2584872 pairs aligned concordantly 0 times; of these:
      201798 (7.81%) aligned discordantly 1 time
    ----
    2383074 pairs aligned 0 times concordantly or discordantly; of these:
      4766148 mates make up the pairs; of these:
        627598 (13.17%) aligned 0 times
        410463 (8.61%) aligned exactly 1 time
        3728087 (78.22%) aligned >1 times
99.60% overall alignment rate
[bam_sort_core] merging from 80 files and 1 in-memory blocks...
rna-seq RNA-Seq • 689 views
ADD COMMENT

Login before adding your answer.

Traffic: 2737 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6