Hello, i'm working on a little perl code that is supposed to get me the ambiguous sequence from .ab1 files. I want to get from the .ab1 the sequence with IUPAC codes.
Didn't found any solution to do it easily.
Read about Staden, but don't think it's what i need. Read about BioPerl, Bio::SeqIO, Bio;;Trace;;ABIF and couldn't find it if it exists...
I started to think about doing it myself from data. If we can do it when visualysing the graph, then we can do it on computer.
I first tryed to get the raw data. I tryed :
#!/bin/perl
use strict;
use warnings;
use Bio::SeqIO;
use Bio::Trace::ABIF;
my $ab1 = Bio::Trace::ABIF->new();
$ab1->open_abif("$ARGV[0]");
print $ab1->sample_name(),"\n";
my @qualite = $ab1->quality_values();
my $sequenceab1 = $ab1->sequence();
my $LSeq = $ab1->sequence_length();
my $sequenceab2 = $ab1->edited_sequence();
#Tryed to search here, but not that...
print $ab1->raw_data_for_channel(1); #Gives me numbers but with no space...
print $ab1->raw_data_for_channel(2);
print $ab1->raw_data_for_channel(3);
print $ab1->raw_data_for_channel(4);
open (FICHIER, ">$ARGV[2]/sortie.txt") || die ("Vous ne pouvez pas créer le fichier \"$ARGV[2]/sortie.txt\"");
print FICHIER $ab1->raw_trace('A');
print FICHIER $ab1->raw_trace('C');
print FICHIER $ab1->raw_trace('G');
print FICHIER $ab1->raw_trace('T');
close (FICHIER); #Same problem, no space...
I wandered a lot on different forum and sites, and even found some good things to try over here. That's why i decided to ask help here.
I tryed this i saw here : Exporting Raw Trace Data I tryed to implement something like that in my code, it results in :
#!/bin/perl
use strict;
use warnings;
use Bio::SeqIO;
use Bio::Trace::ABIF;
use Array::Transpose;
my $ab1 = Bio::Trace::ABIF->new();
$ab1->open_abif("$ARGV[0]");
open (FICHIER, ">$ARGV[2]/sortie.txt") || die ("Vous ne pouvez pas créer le fichier \"$ARGV[2]/sortie.txt\"");
print FICHIER foreach Transpose ([map {[$_,$ab1->raw_trace($_)]} qw(A T G C)]);
close (FICHIER);
But not working, can u explain to me a bit where i'm mistaken ?
I tryed this :
use Array::Transpose;
my $abi;
BEGIN{$abi=Bio::Trace::ABIF->new($_);
$,="\t",$\="\n"};
$abi->open_abif(@ARGV) or die "can not open";
print FICHIER foreach transpose ([map {[$_,$abi->raw_trace($_)]} qw(A T G C)]);
But look like it makes perl freeze...
And the result is odd i got things like this :
ARRAY(0x3d97958)
But something nice,
For the lines : print FICHIER $ab1->raw_trace('T');
The result now give spaces between values...
I'm not sure i get it all ...
Is there any documents that could help me ? I saw the documents of abif files, but i didn't understood it XD...
It's possible to get peak values, and after i was thinking of turning this results in percentages and if a base got more than 20% then it is a possible base and it will result in calling to the IUPAC code according of the basis found. Is it a good approach ?
Or should i modify the datas with baseline reductions or things like that ?
I hope i'm not at the wrong place. If i made a mistake or didn't respect the rules, please warn me to be able to correct myself. And i thank you for reading all this.
As Shenwei stated, it's best to keep threads organized by adding replies as comments rather than answers. This is obviously best for the community, but it's also in your interest, as a question with 5 answers will attract fewer responses than a question with zero answers. I've moved your answers to comments. In the future, please do not post a response as an answer unless you believe it resolves the original question, to the point that in your opinion, further answers are unnecessary.
Oh, ok, thank you, i didn't get it proprely, i thoug i had to answer under people comments, but didn't know i couldn't bring new stuff in the main thread. I'll proceed as you say.
I hope i'll be able to post soon a response as an answer and put an end to this thread =D. Thank you.
I'm a bit confused, why the posts aren't sorted by date ?
I mean, on my screen got first some recent answers, then the oldest ones and the rest is sorted...
And my answer to shenwei isn't under his message, why is that ?
You didn't "answer" my answer, it became a independent one. To comment a answer, please click
ADD COMMENT
below it. To reply a comment, click theADD REPLY
.Comments of same level are sorted by number of up-vote and then time.
Very glad to see you explore the solution by your own. But you continued bring new answers that I can't catch up with. :P
Oh yes, my bad, wasn't looking at the good comment.
I used what you told me about arrays, it made me understood that data i was seeking where in a chart. I used Transpose to get them called.
Pasted my code under your answer.
ps : Why can't u catch up with my answers ?
Well i start geting a bit tired for today, i think i'll end up on this.
I found that Phred may be the answer : https://en.wikipedia.org/wiki/Phred_quality_score#/media/File:Phred_Figure_1.jpg
And maybe a solution for perl use : http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-5-60
What do you think of it ? Can someone please tell me if i'm on a good path or not ?
Wow so fast to answer, thank you.
The first link is Bio.SeqIO for python, i think it's the same module than Bio::SeqIO of perl.
Well i know nothing in python, that's why i prefer doing it on perl. If not possible, i'll turn to python to try that, but i'll have to learn all python from the basics XD...
I saw the first link, and put it in memory for later.
The second link leads me to a question. What's the point converting in .scf file ? Isn't it the same ? What are the big advantages for me to use .scf ?
For the
ARRAY(0x3d97958)
, i don't get your answer. Where and how to dereference ?Please click
ADD COMMENT
below the answer to reply it.I'm not familiar with the
.ab1
file. The sanger sequencing providers always give us the sequence file along with them. But according to your code, it seems could read the four channels. So you can continue to debug your Perl code.For the
ARRAY(0x3d97958)
, I mean the variable that containsARRAY(0x3d97958)
is a reference of an array. You may get the actual values by de-reference it using code like this:Result:
Oh thank you, i will try that tonight, and i'll try to debug my code. I'll come back later.
And for my other questions ? Is there any one able to answer me ?
Is there a clear method to get ambiguous sequence ? A simple programme would be nice, or a module, but the theorical computings can be nice too. To know what to consider as a trace, and what is a real double peak...
Up, found out some things.
With this i can get the values in a nice format.
Now how can i exploit it properly ? Do you have any documentation to help me ?
Hi again, still progressing and got new stuff.
I just found this : Is there a way to access the data stored in a .ab1 file ?
It looks awsome, but cant find any equivalent for perl...
I found also here : http://bioconductor.org/packages/release/bioc/vignettes/sangerseqR/inst/doc/sangerseq_walkthrough.pdf
This :
Can it be that easy ? Get a ratio on the strongest signal, use the noise ratio to delete it. Then choose a percentage of validation ?
Oh and find nice things into .ab1 file :
Am i correct about thoose elements ?
Hi everyone, i worked all day on that today too, and found out some good stuff, should i create another question for that ?
I'm willing to redirect the reflexion.
I found that POSA can be the solution, using phred score to get the ambiguous sequence, but i can't figure how to do that.
And in the exemples, there isn't one for my case or similar...
I managed to putt all datas in a Hash table, to be able to call easily the values for any postition of anybase easily.
But it looks like it's the graph data and not the calculated peaks data...
It means that is POSA isn't the solution, i'll have to calculate it and make an algorythm for it ... It really means that i'll have to open up my old math lessons about graph and statistics...
If you got any lead, i'll take it... Thank you for your time
Up, stil stuck on that problem...
How to get the iupac sequence from the graph datas ?...
Any one able to help me ?
I, tryed to do without that by using some software for the moment.
But i'm still interested in achieving it.
Is it possible to do it threw emboss or threw some python automatic module ?