Entering edit mode
3.2 years ago
harry
▴
40
I used this command:--
grep -Fw -A 1 -f header.txt test.fa >test_result.fa
But it extracts only 1 header, not the whole which are present in my header.txt file.
my header.txt file looks like:---
hsa_circ_0000006
hsa_circ_0000014
hsa_circ_0000015
hsa_circ_0000042
hsa_circ_0000070
hsa_circ_0000072
hsa_circ_0000131
hsa_circ_0000133
hsa_circ_0000160
hsa_circ_0000175
hsa_circ_0000211
hsa_circ_0000219
hsa_circ_0000231
hsa_circ_0000233
hsa_circ_0000236
hsa_circ_0000258
my test.fa file looks like:--
>hsa_circ_0000001|chr1:1080738-1080845-|None|None
GGTCGGCCATGAAGGTGGTGGGGGTCATGAGGTCACAAGGGGGTCGGCCATGTGATGGGGTTGGGTCAGCCGTGCGGTCAGGTCAGGTCGGCCATGAGGTCAGGTGG
>hsa_circ_0000002|chr1:1158623-1159348-|NM_016176|SDF4
CTGACGGGGACGGTCACGTGTCTTGGGACGAGTATAAGGTGAAGTTTTTGGCGAGTAAAGGCCATAGCGAGAAGGAGGTTGCCGACGCCATCAGGCTCAACGAGGAACTCAAAGTGGATGAGGAAAGGTGGATGTGAACACTGACCGGAAGATCAGTGCCAAGGAGATGCAGCGCTGGATCATGGAGAAGACGGCCGAGCACTTCCAGGAGGCCATGGAGGAGAGCAAGACACACTTCCGCGCCGTGGACC
>hsa_circ_0000014|chr1:9991948-9994918-|NM_032368|LZIC
CTGTACACTCAACAGAAAGTGGAGATACTAACAGCTCTTAGGAAACTTGGAGAGAAGCTGACTGCAGATGATGAGGCCTTCTTGTCAGCAAATGCAGGTGCTATACTCAGCCAGTTTGAGAAAGTCTCTACAGACCTTGGCTATTCAGGCAGCTATCAGCCAGGCCTTTAAAACCCCAGAGGTCATCAGATTGTTTGCAAAGAAACAACCAGGTCAGCTTCGGACAAGGTTAGCAGAGATGGATAGAGATCTGATGGTAGGAAAGCTGGAAAGAGAC
So please give me suggestions on what am I wrong. Thanks in advance
Try
thanks, it works for me.
What does that mean?
It's means I only get one fasta sequence from my whole header.txt file.
and there is suppose to be on more than one match? (from your test file, only one matches ;) )
yes, but I got only one sequence.
should not make a difference (in theory) but can you try with
>>
in stead of>
in your command lineYour original command should work fine. There must be something else that is odd with your file.
indeed (would have been strange otherwise), anyway
disk space?
I understand that only the headers are diplayed but not the DNA sequence(?). The very same command works on my machine.
what is the output of
if it's not a pure ASCII file but a CR/LF file then you're workfing with windows files. https://en.wikipedia.org/wiki/Newline#Issues_with_different_newline_formats
I just get one fasta sequence from my whole header.txt file and this fasta sequence is present last in my header.txt file.
can you execute the command as Pierre Lindenbaum asked , and post the output of that here. thanks
on top of GenoMax comment : if only one, then which one? the first one? last one?
the last one.