How to map genome guided assembly TRANSCRIPTS to Genome and extract the longest one for each genome locus
1
0
Entering edit mode
7.6 years ago
Bioinfonext ▴ 470

Hi,

I did genome guided assembly using StringTie, It generate multiple isoforms, Can you please suggest how i can map these transcripts to genome again, to get only single accurately assembled transcripts for each locus.

Thanks

RNA-Seq • 1.7k views
ADD COMMENT
0
Entering edit mode
7.6 years ago
Rohit ★ 1.5k

From your comments I think what you want is redundancy removal. This can be done with -

1) Without the reference using Vmatch or CD-hit or uclust without the need of a reference - combine both denovo and genome guided assembly transcripts and take only the longest from the superset removing complete overlapping regions.

2) Use the reference to map the transcripts with a split-aware mapper and keep only the longest one in the region when they are overlapping subsets.

ADD COMMENT
0
Entering edit mode

1)take only the longest from the superset removing complete overlapping regions? Please suggest if any tool is available, do you think CD-HIT is useful here.

2) Use the reference to map the transcripts with a split-aware mapper and keep only the longest one in the region when they are overlapping subsets.

What about second step, how can I do it?

ADD REPLY
0
Entering edit mode

1) Vmatch or CD-hit or uclust - These are all tools to keep the longest sequences. The commands for vmatch are as follows (these were 2 years old, not sure if there are changes) -

mkvtree -allout -pl -db sequences.fasta -dna -indexname dbname 
vmatch -d -p -dbcluster 100 0 -v -nonredundant nr_sequences.fa dbname

2) You can use GMAP or bwa-mem to map the sequences at high identity. Then use bedtools (cluster) or kent-utilities (bedRemoveOverlap) to remove the subsets or completely overlapping sequences.

ADD REPLY

Login before adding your answer.

Traffic: 2501 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6