Hi, I'm EXTREMELY new to perl and just about all serious bioinformatic work. I gratefully found some one-liner perl scripts from the Edwards Lab, one of which works and the other (and the one most useful to me) does not. This line (below) DOES work and extracts and prints particular sequences (id1 and id2 by name):
perl -ne 'if(/^>(\S+)/){$c=grep{/^$1$/}qw(id1 id2)}print if $c' fasta.file
However I have a txt file with all the names of the sequence identifiers (a set of receptor gene sequences) and I'd like to extract them from a large fasta file (in this case the mouse refseq database). This is a another line (below) from the same lab that should be able to do this, but I've had no luck.
perl -ne 'if(/^>(\S+)/){$c=$i{$1}}$c?print:chomp;$i{$_}=1 if @ARGV' ids.file fasta.file
Does anyone have any suggestions about what may be going wrong? Or other ways to efficiently do this as a complete beginner? I recognize this is quite a vague post, and there are related posts about performing similar tasks but I've been struggling to find efficient ways to do this and not get totally mired in possibilities.
Check similar posts: Retrieve a subset of FASTA from large Illumina multi-FASTA file