Hi,
I found someone, years ago, had a very similar (almost the exact same) question, but it involved using Galaxy, and I would like to use the command line. My goal: Take a sequence of XLOC exons, get their sequences from the genome, and concatenate them. Perhaps blast them afterward, but that can be done separately.
So far I've gotten very close, using the 'bioio' fasta program to have scriptable output.
while IFS=$'\t' read -r -a myArray do tr -d '"' | awk -v FASTAOUTPUT="`fasta $1":"$4"-"$5`" -F '[;\t ]' '$10 ~ /XLOC/ {printf(">%s\n",$10 $13); system("fasta ~/mm.fa " $1":"$4"-"$5)}' done < merged.gtf
The results are pretty good.
>XLOC_000010TCONS_00000036 GAAAACATGTTATCTGAAGAGAAGCAGAGGATCATGCTCCTAGAACGG >XLOC_000010TCONS_00000036 CATTGCAGTTGAAAGAAGAAGAAAACAAGCGGTTAAATCAAAGACTG >XLOC_000010TCONS_00000036 TGTCTCAGAGTTTGTCCTCAGTCTCTTCAAGGCATTCTGAAAAAATAGCCATTAGAGA >XLOC_000010TCONS_00000036 TTTCAGGTGGGAGATTTGGTTCTCATCATCCTAGATGAGCGGCACGACAATTATGTATTGTTTACTGTTAGTCCTACTTTATATTTTCTGCACTCAGAGTCTCTTCCTGCCCTGGATCTCAAACCAGGTGAGGGAG >XLOC_000010TCONS_00000036 TTCAGGTGCATCTAGAAGACCCTGGGTCCTTGGGAAAGTAATGGAAAAGGAATACTGTCAAGCCAAAAAG
However, the dream is to take those lines and turn them, as long as the XLOC and transcript_id (here, $13) match, into a long sequence.
The other reason I'm posting this is so people googling and who see the Galaxy post might also find this post.
Thanks for your time.
Can you post how one of your XLOC looks in GTF file ?
XLOC in GTF: http://pastebin.com/WW5aNfZH