Hi there,
I know there are different ways to extract some "wanted" sequences from multifasta file. In general, having one or more IDs, you can retrieve the sequence from multifasta file using bioperl script or another kind of software as ones pointed out here (Extracting Multiple Fasta Sequences At A Time From A File Containing Many Sequences). In my case I know that I want the sequence number 1, 5 and 10 from a multifasta file. Any idea how can I achieve this using perl/bioperl preferably?
Thanks in advance.
P.D: I'm a beginner in bioinformatics word, sorry if it is a very basic question.
say was added in perl version 5.10, it just adds a newline to your print, so it saves you some typing. If you have something as old as 5.10, you can enable it with use feature 'say'; or you can enable all new features at the command line with -E.
join should be self-explanatory, it just adds a newline after the sequence ID and the sequence itself. In Perl, statements are terminated by a semi-colon, so that last statement is one line. That style is called post-fix, with the conditional at the end, and this makes the code much more readable (esp. when put on separate lines) by removing unnecessary braces. That statement should be easy to read, it says, "print the record if any IDs match those in the list." And that sentence is almost working code! That is a good reason to use the functions from List::Util or List::MoreUtils, they are very expressive and they make the code easy to read (and they are fast, as they are all written in C).
Note that I imported the any function from the package List::MoreUtils. All of the functions in that package are now part of List::Utils, which is shipped with Perl (the lastest version includes these functions). I used the List::MoreUtils package so the code will work with any version of Perl, not just the latest. I think it will simplify things going forward now that all the functions are in one package and in core Perl.
I know you are looking for a Perl solution, but you could use my pyfaidx module for this. In the latest version I've added numeric indexing, so you can do:
Thank you a lot SES. One last question, could you explain me a little bit the meaning of this last two lines?:
And another thing, using "say" is possible to print to a file like with "print?
Thanks again.
say
was added in perl version 5.10, it just adds a newline to your print, so it saves you some typing. If you have something as old as 5.10, you can enable it withuse feature 'say';
or you can enable all new features at the command line with-E
.join
should be self-explanatory, it just adds a newline after the sequence ID and the sequence itself. In Perl, statements are terminated by a semi-colon, so that last statement is one line. That style is called post-fix, with the conditional at the end, and this makes the code much more readable (esp. when put on separate lines) by removing unnecessary braces. That statement should be easy to read, it says, "print the record if any IDs match those in the list." And that sentence is almost working code! That is a good reason to use the functions from List::Util or List::MoreUtils, they are very expressive and they make the code easy to read (and they are fast, as they are all written in C).Note that I imported the
any
function from the package List::MoreUtils. All of the functions in that package are now part of List::Utils, which is shipped with Perl (the lastest version includes these functions). I used the List::MoreUtils package so the code will work with any version of Perl, not just the latest. I think it will simplify things going forward now that all the functions are in one package and in core Perl.