Entering edit mode
8.5 years ago
crespialba
▴
20
I am trying to use MetaClusterTA (http://i.cs.hku.hk/~alse/MetaCluster/download.html) for binning and annotating my metatranscriptomic dataset. To start with, I do it with a simulated dataset that I created with NeSSM and alignment performed by transabyss.
I followed the instructions in the MetaClusterTA README file, I downloaded the database and created the taxid file in the format as described.
However, when I try to run it fails. I tried several options, but none of them seems to be working.
Here is the output:
/software/MetaClusterTA/bin/MetaCluster_TA ~/simulated_dataset/simulation_1.fq ~/simulated_dataset/simulation_2.fq ~/transabyss-final.fa ~/taxids.csv --ReadLen 75 --Species 50 --MaxSpecies 1000
ReadLen: 75
CtgLenThresh: 500
AlignThresh: 76
MC3_Thresh: 0.94
before loading genomes.
0 out of 30269 genomes are loaded.
Finished counting occurences in strings.
Finished mallocing vectors in nodes.
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
alba 6834 98.4 3.1 17342688 16812812 pts/5 Sl+ 13:40 0:36 /software/MetaClusterTA/bin/MetaCluster
[...]
Finished mapping k-mers in strings.
Finished sorting vectors in nodes.
Finished turning capacity to number of unique id num.
Genome DB Initialization is finished.
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
alba 6834 100 3.1 17342688 16812812 pts/5 Sl+ 13:40 0:37 /software/MetaClusterTA/bin/MetaCluster
[...]
initializing reads.
Wed May 11 13:41:01 2016
initializing uset:
Wed May 11 13:44:50 2016
NumNodes: 0
Size: 0
ReverSize: 512
sumlen: 0
MaxSpecies: 2
MinSpecies: 2
MaxSpecies is too large. We will start with half of the group number.
Size:0, classes:50
Segmentation fault (core dumped)
The aligned file looks like this:
>R184409 1327 23064 153518-,...,157386-
CCGGGCGACGGTTGTCCCGGTTTAAGCGTGCAGGTGGGTGGACCAGGCAAATCCGGTCTGCTGTAACACTGAGGCGTGATGACGAGGCACTACGGTGCTGAAGTGACAGATGCCCTGCTTCCAGGAAAAGCCTCTAAGCATCAGGTAACACGAAATCGTACCCCAAACCGACACAGGTGGTCAGGTAGAGAATACCAAGGCGCTTGAGAGAACTCGGGTGAAGGAACTAGGCAAAATGGTGCCGTAACTTCGGGAGAAGGCACGCTGGCGCGTAGGTGAAGGGACTTGCTCCCGGAGCTGAAGCCAGTCGAAGATACCAGCTGGCTGCAACTGTTTATTAAAAACACAGCACTGTGCAAACACGAAAGTGGACGTATACGGTGTGACGCCTGCCCGGTGCCGGAAGGTTAATTGATGGGGTTATCCGCAAGGAGAAGCTCTTGATCGAAGCCCCGGTAAACGGCGGCCGTAACTATAACGGTCCTAAGGTAGCGAAATTCCTTGTCGGGTAAGTTCCGACCTGCACGAATGGCGTAATGATGGCCAGGCTGTCTCCACCCGAGACTCAGTGAAATTGAACTCGCTGTGAAGATGCAGTGTACCCGCGGCAAGACGGAAAGACCCCGTGAACCTTTACTATAGCTTGACACTGAACATTGAGCCTTGATGTGTAGGATAGGTGGGAGGCTTTGAAGTGTGGACGCCAGTCTGCATGGAGCCAACCTTGAAATACCACCCTTTAATGTTTGATGTTCTAACGTGGACCCGTGATCCGGGTTGCGGACAGTGTCTGGTGGGTAGTTTGACTGGGGCGGTCTCCTCCCAAAGAGTAACGGAGGAGCACGAAGGTGGGCTAATCACGGTTGGACATCGTGAGGTTAGTGCAATGGCATAAGCCCGCTTGACTGCGAGAATGACAATTCGAGCAGGTGCGAAAGCAGGTCATAGTGATCCGGTGGTTCTGAATGGAAGGGCCATCGCTCAACGGATAAAAGGTACTCCGGGGATAACAGGCTGATACCGCCCAAGAGTTCATATCGACGGCGGTGTTTGGCACCTCGATGTCGGCTCATCACATCCTGGGGCTGAAGTAGGTCCCAAGGGTATGGCTGTTCGCCATTTAAAGTGGTACGCGAGCTGGGTTTAGAACGTCGTGAGACAGTTCGGTCCCTATCTGCCGTGGGCGCTGGAGAATTGAGGGGGGCTGCTCCTAGTACGAGAGGACCGGAGTGGACGCATCACTGGTGTTCGGGTTGTCATGCCAATGGCATTGCCCGGTAGCTACGTTCGGAACTGATAACCGCTGAAAGCATCTAAGCGGGAAGCC
>R184410 2220 66038 169714+,...,118906+
GGTAATGACTCCAACTTATTGATAGTGTTTTATGTTCAGATAATGCCCGATGACTTTGTCATGCAGCTCCACCGATTTTGAGAACGACAGCGACTTCCGTCCCAGCCGTGCCAGGTGCTGCCTCAGATTCAGGTTATGCCGCTCAATTCGCTGCGTATATCGCTTGCTGATTACGTGCAGCTTTCCCTTCAGGCGGGATTCATACAGCGGCCAGCCATCCGTCATCCATATCACCACGTCAAAGGGTGACAGCAGGCTCATAAGACGCCCCAGCGTCGCCATAGTGCGTTCACCGAATACGTGCGCAACAACCGTCTTCCGGAGACTGTCATACGCGTAAAACAGCCAGCGCTGGCGCGATTTAGCCCCGACATAGCCCCACTGTTCGTCCATTTCCGCGCAGACGATGACGTCACTGCCCGGCTGTATGCGCGAGGTTACCGACTGCGGCCTGAGTTTTTTAAGTGACGTAAAATCGTGTTGAGGCCAACGCCCATAATGCGTGCAGTTGCCCGGCATCCAACGCCATTCATGGCCATATCAATGATTTTCTGGTGCGTACCGGGTTGAGAAGCGGTGTAAGTGAACTGCAGTTGCCATGTTTTACGGCAGTGAGAGCAGAGATAGCGCTGATGTCCGGCAGTGCTTTTGCCGTTACGCACCACCCCGTCAGTAGCTGAACAGGAGGGACAGCTGATAGAAACAGAAGCCACTGGAGCACCTCAAAAACACCATCATACACTAAATCAGTAAGTTGGCAGCATCACCCACAAAATAGTCATGCATTGTGTGCAATAGAAACAGTTCAGATAAAGATAGGGATTAGACTGGCCCCCTGAATCTCCAGACAACCAGTATCACTTAAATAAGTGATAGTCTTAATACTAGTTTTTAGACTAGTCATTGGAGTACAGATGATTGATGTCTTAGGGCCGGAGAAACGCAGACGGCGTACCACACAGGAAAAGATCGCAATTGTTCAGCAGAGCTTTGAACCGGGGATGACGGTCTCCCTCGTTGCCCGGCAACATGGTGTAGCAGCCAGCCAGTTATTTCTCTGGCGTAAGCAATACCAGGAAGGAAGTCTTACTGCTGTCGCCGCCGGAGAACAGGTTGTTCCTGCCTCTGAACTTGCTGCCGCCATGAAGCAGATTAAAGAACTCCAGCGCCTGCTCGGCAAGAAAACGATGGAAAATGAACTCCTCAAAGAAGCCGTTGAATATGGACGGGCAAAAAAGTGGATAGCGCACGCGCCCTTATTGCCCGGGGATGGGGAGTAAGCTTAGTCAGCCGTTGTCTCCGGGTGTCGCGTGCGCAGTTGCACGTCATTCTCAGACGAACCGATGACTGGATGGATGGCCGCCGCAGTCGTCACACTGATGATACGGATGTGCTTCTCCGTATACACCATGTTATCGGAGAGCTGCCCACGTATGGTTATCGTCGGGTATGGGCGCTGCTTCGCAGACAGGCAGAACTTGATGGTATGCCTGCGATCAATGCCAAACGTGTTTACCGGATCATGCGCCAGAATGCGCTGTTGCTTGAGCGAAAACCTGCTGTACCGCCATCGAAACGGGCACATACAGGCAGAGTGGCCGTGAAAGAAAGCAATCAGCGATGGTGCTCTGACGGGTTCGAGTTCTGCTGTGATAACGGAGAGAGACTGCGTGTCACGTTCGCGCTGGACTGCTGTGATCGTGAGGCACTGCACTGGGCGGTGACTACCGGCGGCTTCAACAGTGAAACAGTACAGGACGTCATGCTGGGAGCGGTGGAACGCCGCTTCGGCAACGATCTTCCGTCGTCTCCAGTGGAGTGGCTGACGGATAATGGTTCATGCTACCGGGCTAATGAAACACGCCAGTTCGCCCGGATGTTGGGACTTGAACCGAAGAACACGGCGGTGCGGAGTCCGGAGAGTAACGGAATAGCAGAGAGCTTCGTGAAAACGATAAAGCGTGACTACATCAGTATCATGCCCAAACCAGACGGGTTAACGGCAGCAAAGAACCTTGCAGAGGCGTTCGAGCATTATAACGAATGGCATCCGCATAGTGCGCTGGGTTATCGCTCGCCACGGGAATATCTGCGGCAGCGGGCTTGTAATGGGTTAAGTGATAACAGATGTCTGGAAATATAGGGGCAAATCCACGGGGATACCAGTTCAACCGAAAACGCCAGAGGAGGGGATTACCCGCTGGCAGGGTAAATCTGTGG
[...]
The taxid.csv file looks like this:
superkingdom_id phylum_id class_id order_id family_id genus_id species_id path_to_genome
2 544448 31969 2085 2092 2129 134821 /home/alba/test/Ureaplasma_parvum_serovar_3_ATCC_700970_uid57711/NC_002162.fna
2157 28890 183963 2235 2236 2239 2242 /home/alba/test/Halobacterium_NRC_1_uid57769/NC_002607.fna
2 1224 1236 91347 543 590 28901 /home/alba/test/Salmonella_enterica_serovar_Typhimurium_LT2_uid57799/NC_003197.fna
2 1224 1236 118969 118968 776 777 /home/alba/test/Coxiella_burnetii_RSA_493_uid57631/NC_002971.fna
2157 28890 183939 2182 2183 2184 39152 /home/alba/test/Methanococcus_maripaludis_S2_uid58035/NC_005791.fna
2 1239 186801 186802 543349 2733 2734 /home/alba/test/Symbiobacterium_thermophilum_IAM_14863_uid58165/NC_006177.fna
2 1224 1236 135623 641 511678 668 /home/alba/test/Vibrio_fischeri_ES114_uid58163/NC_006840.fna