Mapping with GMAP: which output option shall I use?
0
0
Entering edit mode
5.9 years ago
luzglongoria ▴ 50

Hi,

I have done an assembly by using reads from a bird infected with a parasite (by using Trinity) >> Trinity.fasta

I have created a index for a genome by using:

gmap_build -d genome -k 15 GCF_000534875.1_SCA1_genomic.fa

A directory called "genome" has been create where now there is:

ll
total 5772532
-rw-rw-r--. 1 luz_garcia_longoria luz_garcia_longoria   12774633 Jan 15 11:28 genome.chromosome
-rw-rw-r--. 1 luz_garcia_longoria luz_garcia_longoria   20699247 Jan 15 11:28 genome.chromosome.iit
-rw-rw-r--. 1 luz_garcia_longoria luz_garcia_longoria          6 Jan 15 11:28 genome.chrsubset
-rw-rw-r--. 1 luz_garcia_longoria luz_garcia_longoria   19461467 Jan 15 11:28 genome.contig
-rw-rw-r--. 1 luz_garcia_longoria luz_garcia_longoria   21602471 Jan 15 11:28 genome.contig.iit
-rw-rw-r--. 1 luz_garcia_longoria luz_garcia_longoria  432037584 Jan 15 11:29 genome.genomebits128
-rw-rw-r--. 1 luz_garcia_longoria luz_garcia_longoria  432037548 Jan 15 11:29 genome.genomecomp
drwxr-xr-x. 2 luz_garcia_longoria luz_garcia_longoria       4096 Jan 15 11:27 genome.maps
-rw-rw-r--. 1 luz_garcia_longoria luz_garcia_longoria   72007680 Jan 15 11:53 genome.ref081locoffsets64meta
-rw-rw-r--. 1 luz_garcia_longoria luz_garcia_longoria  660530864 Jan 15 11:53 genome.ref081locoffsets64strm
-rw-rw-r--. 1 luz_garcia_longoria luz_garcia_longoria 2249041724 Jan 15 11:53 genome.ref081locpositions
-rw-rw-r--. 1 luz_garcia_longoria luz_garcia_longoria     140648 Jan 15 11:53 genome.ref081loctable
-rw-rw-r--. 1 luz_garcia_longoria luz_garcia_longoria  134217744 Jan 15 11:35 genome.ref153offsets64meta
-rw-rw-r--. 1 luz_garcia_longoria luz_garcia_longoria  360216272 Jan 15 11:35 genome.ref153offsets64strm
-rw-rw-r--. 1 luz_garcia_longoria luz_garcia_longoria 1496235468 Jan 15 11:44 genome.ref153positions
-rw-rw-r--. 1 luz_garcia_longoria luz_garcia_longoria          7 Jan 15 11:27 genome.version

(hope I did it correctly)

Now, in order to "clean" my assembly (Trinity.fasta file) I want to align the Trinity transcript contigs to the genome BUT I only want to keep those contigs that DO NOT match with the genome. I have been reading the --help information that gmap provides but I don't have a clear idea about what output options use. My idea is something like:

    gmap -D . -d genome trinity_out_dir/Trinity.fasta --failsonly  -f samse >  trinity_gmap.sam

According to gmap help:

--failsonly                    Print only failed alignments, those with no results

So, in theory, by using this command the program should align my assembly to the genome and create an output file with only those contigs that do not match with the genome, right?

Please, let me know if I am doing something wrong.

Thanks,

Luz

genome transcriptome assembly • 3.0k views
ADD COMMENT
0
Entering edit mode

Is your "genome" - assembled with trinity - actually a trasncriptome assembly ?

ADD REPLY
0
Entering edit mode

No. I downloaded the genome from NCBI.

ADD REPLY
0
Entering edit mode

Very very specific question. Just try it ?

I remember from using Gmap extensively in the past it does a great job of creating GFF3 files of correctly mapped transcripts with intron exon structures which you can view in IGV JBrowse etc. Excellent.

It would also print a list of contigs which no mapping was found for on the screen. Even if the option you suggested above does not work, you could use the list of standard contigs output and write a script, eg in biopython, to select the unmapped contigs.

I would certainly use multiple mappers to get a robust result though, eg bwa mem or minimap2 as well if they perform well on your dataset.

ADD REPLY

Login before adding your answer.

Traffic: 2341 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6