Nucleotide character position in sequence extract
1
0
Entering edit mode
7.6 years ago

I have a text file with ~ 60,000 sequences ( 1 sequence per line) I am trying to extract all the sequences that begin with an A (first nucleotide position in the sequence ) and in the same sequence there should a T at the 10th position. For example:

Let's say these are 3 sequences in the 60,000 sequence file:

AAGGGCAGCTAATCGCCAGTG
CGGGATCTATAAGGTTGGT
AAGGGCAGCGAATCGCCAGTGAGGCT

If the search was done for the 3 sequences- my desired output needs to be only the first one.

I have tried some approaches with grep , but it has not worked out. Any help or suggestion on this matter will be greatly appreciated.

Thanks and regards !

sequence • 1.7k views
ADD COMMENT
2
Entering edit mode
7.6 years ago

) I am trying to extract all the sequences that begin with an A (first nucleotide position in the sequence ) and in the same sequence there should a T at the 10th position. For example:

 grep -E '^A.{8}T' -m1
ADD COMMENT
0
Entering edit mode

Wonderful ! Thanks ! That worked out great. Just needed to remove the -m1 at the end as I wanted to search through the whole file.

ADD REPLY

Login before adding your answer.

Traffic: 1887 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6