Hello guys
I am doing a recruitment analysis looking for metagenomic islands. The inputs are a fasta file of my target species, extracted from the original metagenomic reads file, and the complete reference genome of my target species.
I am using MUMmer Promer utility for the alignment and show-coords utility for data exploration. The only utility that give me the length of the gap between a contig and the next, respecting the synteny of the reference genome is show-tiling, and I have used it to look for gaps larger than 10000 bp. It seems to work weel, when exploring this areas in show-coords output, but I read in a post that show-tiling would be out of dated. This is too bad for me, because I dont know how to look for this low/no alignment regions in the show-coords output.
Can anyone tell me something about show-tiling and, if it really is out of date, how can I get this gap between recruited reads (reads that aligned about 98 identity, similarity and coverage)?
Here is a example of the show-tiling output:
[S1] [S2] [gap between this contig and the next] [length/contig] [align.cov.] [average identity] [orientation] [contig id]
2813632 2813778 15787 147 94.56 85.87 + HD4RJIW01BV3T7
3618731 3618964 10484 234 96.15 90.67 - HD4RJIW02GX7DC
3813833 3814001 12862 169 100.00 85.19 - HD4RJIW02HFIBU
The third field is what I need, and what I am using.
Here is a example of the show-coords output:
[S1] [E1] [S2] [E2] [L1] [L2] [%id] [%sim] [%stp] [cov R] [cov Q] [FRM] [TAGS]
where S=subject, E=query, L=length, stp=stop codon, FRM=orientation/frame
775 903 7 135 129 129 93.02 95.35 1.16 0.00 94.16 1 1 gi|166362741|ref|NC_010296.1| HD4RJIW01EHAFJ
811 963 169 17 153 153 96.08 100.00 0.00 0.00 90.00 1 -2 gi|166362741|ref|NC_010296.1| HD4RJIW02IJPAZ
823 1008 200 15 186 186 96.77 100.00 1.61 0.00 92.08 1 -3 gi|166362741|ref|NC_010296.1| HD4RJIW01DF547
I tried to write a perl script that read a line, split, read the orientation[FRM], then choose between S1 or S2 field, then jump to the next line, split again, read the orientation again, choose the right field and do the subtraction. If larger than 10000, send the two lines to a new file, and so on, but it doesn't worked at all, I couldn't jump to the next line.
If you like, I can show you my frustrated script.... But if is all ok with show-tiling, so I don't need to worry.
EDIT:
The input file:
[marlaux@n4-202-14 recruit_2015]$ cat -A RL5_promer.tiling | head
>gi|166362741|ref|NC_010296.1| 5842795 bases$
99^I306^I-49^I208^I97.60^I92.37^I+^IHD4RJIW01C9YF5$
258^I348^I303^I91^I85.71^I92.31^I+^IHD4RJIW01BYQ91$
652^I751^I17^I100^I88.00^I92.86^I+^IHD4RJIW01ATUR3$
769^I905^I-85^I137^I97.08^I89.77^I+^IHD4RJIW01EHAFJ$
821^I1022^I411^I202^I93.56^I88.89^I-^IHD4RJIW01DF547$
1434^I1611^I-106^I178^I94.38^I78.57^I-^IHD4RJIW02ITKT8$
1506^I1771^I16^I266^I98.50^I61.63^I+^IHD4RJIW02IB4P3$
1788^I2083^I-150^I296^I96.28^I86.32^I+^IHD4RJIW02JGD11$
1934^I2267^I20^I334^I96.41^I88.21^I-^IHD4RJIW01B47F5$
This script I tried to wrote to use the show-coords, instead of show-tiling, because I am not sure if show-tiling is the right utility for this. I hope it is. The script is:
#!/usr/bin/perl
use diagnostics;
use warnings;
print "arquivo 1:\t";
$arq1 = <STDIN>;
open (MYFILE, $arq1);
open (NEW_FILE, '>>coords_teste.tab');
@file = <MYFILE>;
close (MYFILE);
@new_file=();
foreach $line (@file) {
@aligns = split (/\t/, $line);
$p1=$aligns[0];
$p2=$aligns[1];
#print "@aligns $p1 $p2\n";
if ($aligns[11] > 0) {
$a=$p2;
print "$aligns[11] $a\n";
next;
@aligns2 = split (/\t/, $line);
if ($aligns2[11]>0) {
$b=$aligns2[0];
}
else {
$b=$aligns2[1];
}
}
else {
$a=$p1;
next;
@aligns2 = split (/\t/, $line);
if ($aligns2[11]<0) {
$b=$aligns2[1];
}
else {
$b=$aligns2[0];
}
}
$gap=$b-$a;
print "$a $b $gap\n";
if ($gap>9000) {
push (@new_file, ("@aligns\n@aligns2\n"));
print NEW_FILE @new_file;
}
@aligns=();
@aligns2=();
$p1=();
$p2=();
$a=();
$b=();
$gap=();
}
Thank you
Please paste your script here. It looks like the problem is either with the delimiter you're using or the EOL character, so could you also give us the output of:
where show_tiling_file is the file your perl script is trying to parse please?
EDIT: Thanks for adding information, I've moved all the pertinent info to the question.