Hi All,
I am following along the Augustus tutorial in Current Protocols in Bioinformatics (https://currentprotocols.onlinelibrary.wiley.com/doi/full/10.1002/cpbi.57). I am working through Protocol 6- Removing Redundant Gene Structures, and have run into an issue on Step 4- creating a loci.lst file. When I run the provided perl code snippet (see below) I get the error message below. I am not familiar with perl and would greatly appreciate if anyone can help figure out how to get this section to work.
Code Snippet:
cat bonafide.gb | perl -ne ’
if ( $_ =~ m/LOCUS\s+(\S+)\s/ ) {
$txLocus = $1;
} elsif ( $_ =~ m/\/gene=\"(\S+)\"/ ) {
$txInGb3{$1} = $txLocus
}
if( eof() ) {
foreach ( keys %txInGb3 ) {
print "$_\t$txInGb3{$_}\n";
}
}’ > loci.lst
where the bonafide.gb file looks like this: I need to pull out the locus name "h2tg000001l_432666-437116" and the gene name "h2tg000001l_t_gene1_mRNA1"
LOCUS h2tg000001l_432666-437116 4451 bp DNA
FEATURES Location/Qualifiers
source 1..4451
mRNA join(1286..1450,1766..1909,2591..3166)
/gene="h2tg000001l_t_gene1_mRNA1"
CDS join(1286..1450,1766..1909,2591..3166)
/gene="h2tg000001l_t_gene1_mRNA1"
BASE COUNT 599 a 460 c 353 g 518 t 2521 n
ORIGIN
1 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn
61 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn
121 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn
Error message:
Unrecognized character \xE2; marked by <-- HERE after <-- HERE near column 1 at -e line 1.
./make_loci_list.sh: line 4: syntax error near unexpected token `('
./make_loci_list.sh: line 4: `if ( $_ =~ m/LOCUS\s+(\S+)\s/ ) {'
Any help in either debugging or any way to get a similar result with bash would be incredibly helpful.
Many Thanks!