Hi, I am trying to read a Gen bank file which I have successfully done. Now, I trying to fix this error in this program that where ever it is finding gene it is printing all the results. I just want gene to be printed once that's all. I tried looping or increasing the counter and then returning the value to 0 but at some place I am not able to implement the code properly. I am posting the code below. Thanks in advance ,
SAMPLE FILE
LOCUS NR_046018 1652 bp RNA linear PRI 12-MAY-2017
DEFINITION Homo sapiens DEAD/H-box helicase 11 like 1 (DDX11L1), non-coding RNA.
ACCESSION NR_046018 XM_003403543
VERSION NR_046018.2
KEYWORDS RefSeq.
SOURCE Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
REFERENCE 1 (bases 1 to 1652)
AUTHORS Costa V, Casamassimi A, Roberto R, Gianfrancesco F, Matarazzo MR,
D'Urso M, D'Esposito M, Rocchi M and Ciccodicola A.
TITLE DDX11L: a novel transcript family emerging from human subtelomeric regions
JOURNAL BMC Genomics 10, 250 (2009)
PUBMED 19476624
REMARK Publication Status: Online-Only
COMMENT VALIDATED REFSEQ: This record has undergone validation or
preliminary review. The reference sequence was derived from
AM992871.1.
On Jun 5, 2012 this sequence version replaced NR_046018.1.
##Evidence-Data-START##
Transcript exon combination :: AM992871.1, BM920886.1 [ECO:0000332]
RNAseq introns :: single sample supports all introns
SAMEA1968968, SAMEA2148874
[ECO:0000348]
##Evidence-Data-END##
PRIMARY REFSEQ_SPAN PRIMARY_IDENTIFIER PRIMARY_SPAN COMP
1-1652 AM992871.1 1-1652
FEATURES Location/Qualifiers
source 1..1652
/organism="Homo sapiens"
/mol_type="transcribed RNA"
/db_xref="taxon:9606"
/chromosome="1"
/map="1p36.33"
gene 1..1652
/gene="DDX11L1"
/note="DEAD/H-box helicase 11 like 1"
/pseudo
/db_xref="GeneID:100287102"
/db_xref="HGNC:HGNC:37102"
misc_RNA 1..1652
/gene="DDX11L1"
/product="DEAD/H-box helicase 11 like 1"
/pseudo
/db_xref="GeneID:100287102"
/db_xref="HGNC:HGNC:37102"
CODE:
open (INFILE,"rna.txt");
while ($line= <INFILE>)
{
chomp($line);
if ($line =~ /(LOCUS\s*)(\w*)(.*)/)
{
print "\n";
print "Locus: $2\t";
}
elsif($line =~ /^\s*\/gene\=\"(.+)\"/ )
{
print "Gene: $1\n";
}
}
After this script is run the output is -
LOCUS: NR_046018 Gene: DDX11L1
Gene: DDX11L1
Hello,
if you only have one locus in the file, you can just leave the loop by using the last statement if you have found the gene line.
fin swimmer
Hello, I have a long file, I just posted a short file here. I am trying last statement but not able to get the desired result. It would be great if you could explain with a small example.
I'm not familiar with
perl
. Try this:fin swimmer
Looks good, just remember to declare and initialize variables so it works with strict:
Hi Kriti,
This should work as mentioned by finswimmer
Notice the
last
functionOutput
Hello sir, Thanks for your reply. I have done this, the last statement her is not useful because the Genbank file has other Locus and genes too. I hope I am able to explain properly. Suggest a method which helps in matching this line:
and then matches this line:
Got the point. Shall get back to this. Also, please use the formatting bar (especially the
code
option) to present your post better. I've done it for you this time.