Previous month:
April 2012
Next month:
August 2012

May 2012

def main() in Python considered harmful

Python-logoI recently graded the first Python programming assignments in the course I'm teaching on Social and Computational Intelligence in the Computing and Software Systems program at University of Washington Bothell. Most of the students are learning Python as a second (or third) language, approaching it from the perspective of C++ and Java programming, the languages we use in nearly all our other courses. Both of those languages require the definition of a main() function in any source file that is intended to be run as an executable program, and so many of the submissions include the definition of a main() function in their Python scripts.

In reviewing some recurring issues from the first programming assignment during class, I highlighted this practice, and suggested it was unPythonistic (a la Code like a Pythonista). I recommended that the students discontinue this practice in future programming assignments, as unlike in C++ and Java, a function named main has no special meaning to the the Python interpreter. Several students asked why they should refrain from this practice - i.e., what harm is there in defining a main() function? - and one sent me several examples of web sites with Python code including main() functions as evidence of its widespread use.

Comfort_zone_growth_zone_panic_zoneIn my experience, the greatest benefit to teaching is learning, and the students in my classes regularly offer me opportunities to move out of my comfort zone and into my growth zone (and occasionally into my panic zone). I didn't have a good answer for why def main() in Python was a bad practice during that teachable moment  ... but after lingering in the growth zone for a while, I think I do now.

The potential problem with this practice is that any function defined at the top level of a Python module becomes part of the namespace for that module, and if the function is imported from that module into the current namespace, it will replace any function previously associated with the function name. This may lead to unanticipated consequences if it is combined with a practice of using wildcards when importing, e.g., from module import * (though it should be noted that wildcard imports are also considered harmful by Pythonistas).

I wrote a couple of simple Python modules - main1.py and main2.py - to illustrate the problem:

# main1.py
import sys

def main():
    print 'Executing main() in main1.py'
    print '__name__: {}; sys.argv[0]: {}\n'.format(__name__, sys.argv[0])
 
if __name__ == '__main__':
    main()

# main2.py
import sys

def main():
    print 'Executing main() in main2.py'
    print '__name__: {}; sys.argv[0]: {}\n'.format(__name__, sys.argv[0])
 
if __name__ == '__main__':
    main()

The main functions are identical except one has a string 'main1.py' whereas the other has a string 'main2.py'. If either of these modules are executed from the command line, they execute their own main() functions, printing out the hard-coded strings and the values of __name__ and sys.argv[0] (the latter of which will only have a value when the module is executed from the command line).

$ python main1.py
Executing main() in main1.py
__name__: __main__; sys.argv[0]: main1.py

$ python main2.py
Executing main() in main2.py
__name__: __main__; sys.argv[0]: main2.py

When these modules are imported into the Python interpreter using wildcards, the effect of invoking the main() function will depend on whichever module was imported first.

$ python
Python 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> locals()
{'__builtins__': <module '__builtin__' (built-in)>, '__name__': '__main__', '__doc__': None, '__package__': None}
>>> from main1 import *
>>> locals()
{'__builtins__': <module '__builtin__' (built-in)>, '__package__': None, 'sys': <module 'sys' (built-in)>, '__name__': '__main__', 'main': <function main at 0x1004aa398>, '__doc__': None}
>>> main()
Executing main() in main1.py
__name__: main1; sys.argv[0]:

>>> from main2 import *
>>> locals()
{'__builtins__': <module '__builtin__' (built-in)>, '__package__': None, 'sys': <module 'sys' (built-in)>, '__name__': '__main__', 'main': <function main at 0x1004aa140>, '__doc__': None}
>>> main()
Executing main() in main2.py
__name__: main2; sys.argv[0]:

>>> exit()
$

Now, this may all be much ado about little, especially given the aforementioned caveat about the potential harm of using wildcards in import statements. I suppose if one were to execute the more Pythonistic selective imports, i.e., from main1 import main and from main2 import main, at least the prospect of the main() function being overridden might be more apparent. But people learning a programming language for the first time - er, or teaching a programming language for the second time - often use shortcuts (such as wildcard imports), and so I offer all this as a plausible rationale for refraining from def main() in Python.

As part of my practice of leaning into discomfort and staying in the growth zone, I welcome any relevant insights or experiences that others may want to share.