Checking for a specific file type
1
0
Entering edit mode
6.4 years ago
mdsiddra ▴ 30

I am using python to take a file (protein sequence file) from user. I want to know how can I keep a check if a user specifies a specific file /typeformat.

for example, the function is :

def user_file():
    file_var = input("Enter your file name: ")

    with open(file_var, 'r') as obj:
        print (obj.read())

As I am working with protein sequence files, so I want to check if user has entered a "input.phy" (a sequences file in phylip format) then a block of statements are printed / calculated and if user has entered a "input.aln" (A sequence file in clustalw format) , then another conditional block of statements are printed and if user has entered a "input.fa" (a sequence file in FASTA format), then a 3rd block of statements are printed.

I am confused about how the specific file type will be checked?

Python • 2.8k views
ADD COMMENT
4
Entering edit mode

input("Enter your file name: ")

please don't. every-time someone write such interactive program, god kills a kitten,

just parse the cmd-line arguments https://docs.python.org/2/howto/argparse.html

I want to check if user has entered a "input.phy"

just check the extension of the file ?

but anyway, you should never trust a user: parse the input and throw an exception if the file is badly formatted.

ADD REPLY
0
Entering edit mode

Yes, you are right. I would prefer to check both the file type (extension) and the format of the file and also throw an exception if the file is badly formatted.

ADD REPLY
0
Entering edit mode

So is there a way I can check the file type using python? Can I get some help? I tried using this also,

https://stackoverflow.com/questions/5899497/checking-file-extension

but couldn't get successful.

ADD REPLY
0
Entering edit mode

I'm working on a certain program and I need to have it do different things if the file in question is a "phy" (phylip file), or a "aln" (clustal file) or "fa"(fasta file). Could I just use this?

if m == *.phy
   ....
elif m == *.aln

elif m== *.fa

Note: When I use that, it tells me invalid syntax. So what do I do?

ADD REPLY
0
Entering edit mode

Are you just concerned about the file extension or do you also want your program to check the contents?

ADD REPLY
0
Entering edit mode

I am preferably concerned with the extension of the file but I would check both the file type (extension) and the format of the file and also throw an exception if the file is badly formatted.

ADD REPLY
0
Entering edit mode

Hi mdsiddra,

Please give feedback on your previous thread: Text file to Phylip format

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

Yes, I have given feedback there already.

ADD REPLY
0
Entering edit mode

Great, but you haven't accepted the answer of jrj.healey, although I believe that he solved your issue.

ADD REPLY
0
Entering edit mode

Well, I believed that it was very close and really helpful, but I have accepted it also. Thankyou for guiding me this way. :)

ADD REPLY
0
Entering edit mode

Thankyou for this sharing..! Can I also get some help about my question I mentioned above.???

ADD REPLY
1
Entering edit mode
6.3 years ago
Joe 21k

Here is a brute force approach to testing file types based on my script at:

https://github.com/jrjhealey/bioinfo-tools/blob/master/x2y.py#L91-L115

def guessExt(infile, verbose):
    """If no input type was specified, guess it from the extension name or emit a warning"""
    extension = os.path.splitext(infile)[1]
    if verbose > 0: print("Extension is " + extension)
    # Figure out what extension to return
    if extension in (".abi",".ab1"):
        type = "abi"
    elif extension in (".embl"):
        type = "embl"
    elif extension in (".clust", ".cw", ".clustal"):
        type = "clustal"
    elif extension in (".fa", ".fasta", ".fas", ".fna", ".faa", ".afasta"):
        type = "fasta"
    elif extension in (".fastq", ".fq"):
        type = "fastq"
    elif extension in (".gbk", ".genbank", ".gb"):
        type = "genbank"
    elif extension in (".paup", ".nexus"):
        type = "nexus"
    else:
        print("Couldn't determine the file type from the extension. Reattempt with the -j|--intype option specified.")
        sys.exit(1)

    if verbose > 0: print("Your file looks like a " + type)
    return type

The rest of the script used BioPython which handles the vast, vast majority of file parsing an exception handling if it fails. I would suggest you use it unless you have a very, very good reason not to. Predicting all the ways users will break your tool with dodgy input files is an NP-hard problem. This set of switches exists purely to identify the file format by extension and return a string to pass to BioPython which does the rest.

ADD COMMENT
0
Entering edit mode

Alright, Thanku for the help. Let me try this with my script. But it would have helped if I could throw exceptions for the dodgy input files by users.

ADD REPLY
1
Entering edit mode

You can. Just wrap any section you like in a try and except block, then raise whatever error you wish, some pseudo-code for example:

try:
    ext = os.path.splitext(infile)[1]
except:
     sys.stderr.write("Your file doesn't have an extension")
ADD REPLY
0
Entering edit mode

Yeah sure. Thanku for the help.

ADD REPLY

Login before adding your answer.

Traffic: 2447 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6