Mapping: how to deal with not unique mapping to homologous genes in different contigs
0
0
Entering edit mode
9.3 years ago
Ali May • 0

I posted this on seqanswers.com also. Since I think it's still not possible to cross post, I'm posting it biostars as well.

I'm trying to map some RNA-Seq reads to a fungus genome using the STAR aligner, and am subsequently using HTSeq to estimate the counts per gene. I am using an ENSEMBL genome fasta file and the corresponding GTF file. My problem is, the assembly consists of 413 super contigs, so I am unable to only use half of the homologous chromosomes (i.e. having a haploid reference) to avoid having my reads matching to multiple locations. I end up having almost all my reads mapping not uniquely. Any suggestions for this? What is the most popular way to deal with this problem, namely the lack of fully assembled chromosomes and therefore the resulting inability to filter the homologous ones to avoid non-unique alignments?

Thanks in advance.

RNA-Seq alignment Assembly • 2.5k views
ADD COMMENT
0
Entering edit mode

Are you sure the genome you are using is not an haploid draft? There are other causes (in fact, I believe more common causes) which explain a big number of scaffolds in a draft genome then representing a diploid genome.

ADD REPLY
0
Entering edit mode

Yes, I am sure it is not a haploid draft. The assembly and the corresponding information can be found in this link: http://fungi.ensembl.org/Candida_albicans_sc5314/Info/Annotation/#assembly

As it says in the link, there are 14,213 coding genes in the draft, whereas I know when the homologous genes in one of the homologous chromosomes are filtered out this number is almost halved (i.e. when I use a fasta file to cluster the coding DNA sequences at 99% identity I end up with ~8000 entries).

ADD REPLY

Login before adding your answer.

Traffic: 2915 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6