How to split the header line into its components
3
0
Entering edit mode
3.7 years ago
Inayat • 0

I have txt file containing several of these header lines

>lcl|NC_001133.9_cds_NP_009332.1_1 [gene=PAU8] [locus_tag=YAL068C] [db_xref=SGD:S000002142,GeneID:851229] [protein=seripauperin PAU8] [protein_id=NP_009332.1] [location=complement(1807..2169)] [gbkey=CDS]

I want to read specific values 1807 and 2169 mentioned in "location". I have tried to use split() and strip command in python but it doesn't work as expected. Can you please suggest the way how to do this? Any kind of help will be appreciated.

Thank you

Macspider • 1.1k views
ADD COMMENT
1
Entering edit mode
3.7 years ago
5heikki 11k
awk 'BEGIN{FS="\\[location="}{print $2}' input.txt | awk 'BEGIN{FS="("}{print $2}' | awk 'BEGIN{FS=")"}{print $1}'
ADD COMMENT
1
Entering edit mode
3.7 years ago
$awk -v OFS="\t" -F "=|\(|\..|\)" '/^>/ {print $11,$12}' test.fa                                                                                                                        
1807    2169

$ awk -v OFS="\t" -F "complement|\(|\..|\)" ' /^>/ {print $6,$7}' test.fa

input:

$ cat test.fa                                                                                                                                                                        
>lcl|NC_001133.9_cds_NP_009332.1_1 [gene=PAU8] [locus_tag=YAL068C] [db_xref=SGD:S000002142,GeneID:851229] [protein=seripauperin PAU8] [protein_id=NP_009332.1] [location=complement(1807..2169)] [gbkey=CDS]
ADD COMMENT
0
Entering edit mode
$ grep -Po '(?<=complement\().*(?=\)\])' test.fa | sed 's/\../\t/'                   
1807    2169
ADD REPLY
1
Entering edit mode
3.7 years ago
ATpoint 86k
echo ">lcl|NC_001133.9_cds_NP_009332.1_1 [gene=PAU8] [locus_tag=YAL068C] [db_xref=SGD:S000002142,GeneID:851229] [protein=seripauperin PAU8] [protein_id=NP_009332.1] [location=complement(1807..2169)] [gbkey=CDS]" \
| awk -F "location=" '{print $2}' | \
| cut -d "(" -f2 | cut -d ")" -f1 \
| awk -F "." 'OFS="\t" {print $1, $3}'
ADD COMMENT

Login before adding your answer.

Traffic: 2495 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6