Hi, I'm trying to make a script to extract uniques taxonomic assignation from a txt file obtained from QIIME analyses:
results.txt file:
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Nocardiaceae;Other 4.48159186143e-07
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Nocardiaceae;g__ 1.34447755843e-06
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Nocardiaceae;g__Rhodococcus 4.48159186143e-07
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Nocardioidaceae;g__ 6.72238779214e-06
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Nocardioidaceae;g__Kribbella 4.48159186143e-07
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Propionibacteriaceae;g__ 2.24079593071e-06
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Pseudonocardiaceae;g__Pseudonocardia 2.24079593071e-06
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Pseudonocardiaceae;g__Pseudonocardia 2.24079593071e-06
what i want is extract the family or genera (example: f__Nocardioidaceae ) assignation in a non redundant list. in this case will be:
for family :
Nocardiaceae
Nocardioidaceae
Propionibacteriaceae
Pseudonocardiaceae
I have been used this code:
#!/usr/bin/perl -w
use strict;
while ( <>) {
$line = $_;
chomp($line);
if ($line=~ m/^#/g) {
next;
}
elsif ($family) {
my @fam= ($line=~ m/f__[\W]?(.*)[\W]?;g/g);
foreach(@fam){
if ($_=~ m/^$/g) {
next;
}
else {
my @uniq_list = uniq(@fam);
print "$_\n";
}
}
}
else {
print "ERROR\n";
exit;
}
}
# Second option using a subroutine:
sub uniq {
my %seen;
grep !$seen{$_}++, @_;
}
#and then......
else {
my @uniq_list = uniq(@fam);
print OUTFILE "$_\n";
}
}
None of them works !!!
Thanks so much !!!
Hi, please use code formatting for code. Please be specific, what is it that doesn't work? Error message, no output, ...? Also, if possible try to reduce the script posted to the essentials, remove getopt, file input and output open. Using a script in your case that reads from stdin and prints to stdout is totally sufficient.
Further do you need the list in any specific order?