UWB

Valuable Advice on Preparing for Technical Interviews ... and Careers

CrackingTheCodingInterview TheGoogleResume The cover of Gayle Laakmann McDowell's book, Cracking the Coding Interview, and links to her Career Cup web site and Technology Woman blog are included in the slides I use on the first day of every senior (400-level) computer science course I have taught over the last two years. These are some of the most valuable resources I have found for preparing for interviews for software engineering - as well as technical program manager, product manager or project manager - positions. I recently discovered she has another book, The Google Resume, that offers guidance on how to prepare for a career in the technology industry, so I've added that reference to my standard introductory slides.

While my Computing and Software Systems faculty colleagues and I strive to prepare students with the knowledge and skills they will need to succeed in their careers, the technical interview process can prove to be an extremely daunting barrier to entry. The resources Gayle has made available - based on her extensive interviewing experience while a software engineer at Google, Microsoft and Apple - can help students (and others) break through those barriers. The updated edition of her earlier book focuses on how to prepare for interviews for technical positions, and her latest book complements this by offering guidance - to students and others who are looking to change jobs or fields - on how to prepare for careers in the computer technology world.

Gayle_uwb_wide

I have been looking for an opportunity to invite Gayle to the University of Washington Bothell to present her insights and experiences directly to our computer science students since I started teaching there last fall, and was delighted when she was able to visit us last week. Given the standing room only crowd, I was happy to see that others appreciated the opportunity to benefit from some of her wisdom. I will include fragments of this wisdom in my notes below, but for the full story, I recommend perusing her slides (embedded below) or watching a video of a similar talk she gave in May (also embedded further below), and for anyone serious about preparing for tech interviews and careers, I recommend reading her books.

Gayle emphasized the importance of crafting a crisp resume. Hiring managers typically spend no more than 15-30 seconds per resume to make a snap judgment about the qualifications of a candidate. A junior-level software engineer should be able to fit everything on one page, use verbs emphasizing accomplishments (vs. activities or responsibilities), and quantify accomplishments wherever possible. Here are links to some of the relevant resources available at her different web sites:

One important element of Gayle's advice [on Slide 13] that aligns with my past experience - and ongoing bias - in hiring researchers, designers, software engineers and other computing professionals is the importance of working on special projects (or, as Gayle puts it, "Build something!"). While graduates of computer science programs are in high demand, I have always looked for people who have done something noteworthy and relevant, above and beyond the traditional curriculum, and it appears that this is a common theme in filtering prospective candidates in many technology companies. This is consistent with advice given in another invited talk at UWB last year by Jake Homan on the benefits of contributing to open source projects, and is one of the motivations behind the UWB CSS curriculum requiring a capstone project for all our computer science and software engineering majors.

IntroductionToAlgorithmsGayle spoke of "the CLRS book" during her talk at UWB and her earlier talk at TheEasy, a reference to the classic textbook, Introduction to Algorithms, by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein. She said that entry-level software engineer applicants typically won't need to know data structures and algorithms at the depth or breadth presented in that book, and she offers a cheat sheet / overview of the basics on Slides 23-40, and an elaboration in Chapters 8 & 9 of her CtCI book. However, for those who are interested in delving more deeply into the topic, an online course based on the textbook is now part of the MIT Open CourseWare project, and includes video & audio lectures, selected lecture notes, assignments, exams and solutions.

One potential pitfall to candidates who prepare thoroughly for technical interviews is they may get an interview question that they have already seen (and perhaps studied). She recommended that candidates admit to having seen a question before, equating not doing so with cheating on an exam, and to avoid simply reciting solutions from memory, both because simple slip-ups are both common and easy to catch.

Gayle stressed that was there is no correlation between how well a candidate thinks he or she did in an interview and how well their interviewers thought they did. In addition to natural biases, the candidate evaluation process is always relative: candidates' responses to questions are assessed in the context of the responses of other candidates for the same position. So even if a candidate thinks he or she did well on a question, it may not be as well as other candidates, and even if a candidate thinks he or she totally blew a question, it may not have been blown as badly as other candidates blew the question.

Another important factor to bear in mind is that most of the big technology companies tend to be very conservative in making offers; they generally would prefer to err on the side of false negatives than false positives. When they have a candidate who seems pretty good, but they don't feel entirely confident about the candidate's strength, they have so many [other] strong candidates, they would rather reject someone who may have turned out great than risk hiring someone who does not turn out well. Of course, different companies have different evaluation and ranking schemes, and many of these details can be found in her CtCI book.

Gayle visits the Seattle area on a semi-regular basis, so I'm hoping I will be able to entice her to return each fall to give a live presentation to our students. However, for the benefit of those who are not able to see her present live, here is a video of her Cracking the Coding Interview presentation at this year's Canadian University Software Engineering Conference (CUSEC 2012) [which was also the site of another great presentation I blogged about a few months ago, Bret Victor's Inventing on Principle].

Finally, I want to round things out on a lighter note, with a related video that I also include in my standard introductory slides, Vj Vijai's Hacking the Technical Interview talk at Ignite Seattle in 2008:


def main() in Python considered harmful

Python-logoI recently graded the first Python programming assignments in the course I'm teaching on Social and Computational Intelligence in the Computing and Software Systems program at University of Washington Bothell. Most of the students are learning Python as a second (or third) language, approaching it from the perspective of C++ and Java programming, the languages we use in nearly all our other courses. Both of those languages require the definition of a main() function in any source file that is intended to be run as an executable program, and so many of the submissions include the definition of a main() function in their Python scripts.

In reviewing some recurring issues from the first programming assignment during class, I highlighted this practice, and suggested it was unPythonistic (a la Code like a Pythonista). I recommended that the students discontinue this practice in future programming assignments, as unlike in C++ and Java, a function named main has no special meaning to the the Python interpreter. Several students asked why they should refrain from this practice - i.e., what harm is there in defining a main() function? - and one sent me several examples of web sites with Python code including main() functions as evidence of its widespread use.

Comfort_zone_growth_zone_panic_zoneIn my experience, the greatest benefit to teaching is learning, and the students in my classes regularly offer me opportunities to move out of my comfort zone and into my growth zone (and occasionally into my panic zone). I didn't have a good answer for why def main() in Python was a bad practice during that teachable moment  ... but after lingering in the growth zone for a while, I think I do now.

The potential problem with this practice is that any function defined at the top level of a Python module becomes part of the namespace for that module, and if the function is imported from that module into the current namespace, it will replace any function previously associated with the function name. This may lead to unanticipated consequences if it is combined with a practice of using wildcards when importing, e.g., from module import * (though it should be noted that wildcard imports are also considered harmful by Pythonistas).

I wrote a couple of simple Python modules - main1.py and main2.py - to illustrate the problem:

# main1.py
import sys

def main():
    print 'Executing main() in main1.py'
    print '__name__: {}; sys.argv[0]: {}\n'.format(__name__, sys.argv[0])
 
if __name__ == '__main__':
    main()

# main2.py
import sys

def main():
    print 'Executing main() in main2.py'
    print '__name__: {}; sys.argv[0]: {}\n'.format(__name__, sys.argv[0])
 
if __name__ == '__main__':
    main()

The main functions are identical except one has a string 'main1.py' whereas the other has a string 'main2.py'. If either of these modules are executed from the command line, they execute their own main() functions, printing out the hard-coded strings and the values of __name__ and sys.argv[0] (the latter of which will only have a value when the module is executed from the command line).

$ python main1.py
Executing main() in main1.py
__name__: __main__; sys.argv[0]: main1.py

$ python main2.py
Executing main() in main2.py
__name__: __main__; sys.argv[0]: main2.py

When these modules are imported into the Python interpreter using wildcards, the effect of invoking the main() function will depend on whichever module was imported first.

$ python
Python 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> locals()
{'__builtins__': <module '__builtin__' (built-in)>, '__name__': '__main__', '__doc__': None, '__package__': None}
>>> from main1 import *
>>> locals()
{'__builtins__': <module '__builtin__' (built-in)>, '__package__': None, 'sys': <module 'sys' (built-in)>, '__name__': '__main__', 'main': <function main at 0x1004aa398>, '__doc__': None}
>>> main()
Executing main() in main1.py
__name__: main1; sys.argv[0]:

>>> from main2 import *
>>> locals()
{'__builtins__': <module '__builtin__' (built-in)>, '__package__': None, 'sys': <module 'sys' (built-in)>, '__name__': '__main__', 'main': <function main at 0x1004aa140>, '__doc__': None}
>>> main()
Executing main() in main2.py
__name__: main2; sys.argv[0]:

>>> exit()
$

Now, this may all be much ado about little, especially given the aforementioned caveat about the potential harm of using wildcards in import statements. I suppose if one were to execute the more Pythonistic selective imports, i.e., from main1 import main and from main2 import main, at least the prospect of the main() function being overridden might be more apparent. But people learning a programming language for the first time - er, or teaching a programming language for the second time - often use shortcuts (such as wildcard imports), and so I offer all this as a plausible rationale for refraining from def main() in Python.

As part of my practice of leaning into discomfort and staying in the growth zone, I welcome any relevant insights or experiences that others may want to share.


Hadoop, Apache and the Benefits of Contributing to Open Source Projects

Hadoop_elephant Jake Homan, a Senior Software Engineer at LinkedIn and UW Bothell CSS graduate, gave a recent guest lecture at UWB on Apache Hadoop: Petabytes and Terawatts, offering an overview and applications of Hadoop as well as related distributed computing tools developed within the Apache Software Foundation. The presentation offered a great balance of breadth and depth that was very well suited to the audience, primarily composed of senior undergraduate and Master's-level computer science students (and a few faculty). One of the most valuable insights shared by Jake was the enormous value that contributing to open source software projects can offer CS students - and other interested in software engineering career opportunities - to develop and demonstrate both their technical skills and their ability to work and play well with others.

HDFSJake explained that Hadoop has two primary components: a distributed file system and a framework to support distributed computation. The Hadoop Distributed File System (HDFS) divides files into 128 MB blocks, makes 2 copies - yielding 3 replicas - of all the blocks, and then distributes the blocks on different DataNodes (computers). A NameNode manages the DataNodes and, among other tasks, regenerates the file blocks stored on a DataNode when that DataNode dies - and given enough DataNodes and enough time, a DataNode is sure to die - to ensure that 3 replicas of every file block are always available.

MapReduceHadoop provides a Java implementation of the MapReduce framework to support distributed computation. Using the prototypical example of a word count program - which Jake described as the "hello, world" program for distributed computing - he showed how to break down a computation into a Mapper and a Reducer. Generally speaking, a Mapper takes a <key, value> pair and generates zero or more <key, value> pairs; a Reducer takes all the values of one key and generates zero or more <key, value> pairs.

Applying this framework to the problem of counting words in a text (or collection of texts), a Hadoop program might start by splitting the text into lines or sentences where the keys represent the sequence positions of lines or sentences and the values represent the segments of text, e.g.,

<0, "Four score and seven years ago ...">
...

Hadoop would distribute these <key, value> pairs acrross DataNodes, where a TaskTracker on each DataNode would use a Mapper to split its line or sentence into a sequence of words and counts (where all counts are initially 1), yielding

<"Four", 1>
<"score", 1>
<"and", 1>
<"seven", 1>
...

During the Reduce phase, the outputs of Mappers are aggregated and sorted by key, yielding <key, list-of-values> pairs:

<"a", [1, 1, 1, 1, 1, 1, 1]>
<"above", [1]>
<"add", [1]>
...

These are then reduced [again] to <key, value> pairs, yielding the final sequence of word and frequency counts:

<"a", 7>
<"above", 1>
<"add", 1>
...

Distributed systems are increasingly the norm rather than the exception in companies providing any kind of web services - or involving any other kind of non-trivial computation - and so knowledge and experience in working with distributed systems is an increasingly important component of computer science education. However, even with knowledge of distributed systems, writing programs that can take advantage of distributed system architecture is still difficult and error-prone.

Jake said that if programmers can learn to think in terms of MapReduce, they can use Hadoop to manage many of the logistical and coordination aspects of distributed system programming; if programmers want to think or work with relational databases (SQL), they can use Hive; and if they want to think or work with higher level scripting languages, they can use Pig. Both of these are among the many Apache tools that can be layered on top of Hadoop. [I wrote about several of these tools in a post last August on Hadoop Day in Seattle: Hadoop, Cascading, Hive and Pig.]

One of the most useful pieces of knowledge that Jake shared during his presentation concerned the often underappreciated second-order benefits of contributing to open source projects, i.e., above and beyond the intrinsic value of improving software tools which, in many cases, programmers are using themselves. The first question he asks a software engineer candidate is "Have you done open source?" Open source software projects typically make all the code and the online conversations about the code publicly available, so Jake can do some background investigation to learn about both the open source code the candidate has written and the way the candidate has interacted with other contributors and stakeholders (e.g., the way a candidate has responded to bug reports or feature requests). The candidacy of any software engineer who has not contributed to any open source software projects may be considerably diminished by a deficit in this area.

ApacheSoftwareFoundationLogoGetting involved in an open source project can be intimidating, so Jake shared a link to the Apache Software Foundation list of ASF newbie issues that would be appropriately scoped projects for someone who wants to test the waters. I have not contributed directly to any Apache project - yet - but I did engage in some civic hacktivism at Data Camp Seattle in February, and some random hacks of kindness at RHOK 3 in June. I would like to organize an appropriately and inspiringly themed open source hackathon at UWB for students, faculty and other interested parties sometime in the near future ... but it will have to wait until after the fall quarter, as the three classes I'm teaching now are consuming nearly all time and energy. I'm glad I at least took an hour off last week for Jake's engaging and educational presentation.


Continuing Education: Senior Lecturer at the University of Washington, Bothell

Uwb-logo I recently embarked on the next stage of my re-engagement with academia, as a Senior Lecturer in the Computer & Software Systems program at the University of Washington, Bothell. Like the Tacoma campus, where I taught last winter and spring, the Bothell campus cultivates a small college culture within a large university system: classes are relatively small (with a maximum of 30-45 students in each) and there is a strong student-centered orientation among all the faculty and staff. The faculty - tenure track and non-tenure track - are actively engaged in research and other scholarly activities, but excellence in teaching is an essential attribute among all faculty.

During my first quarter, I am teaching courses on the Fundamentals of Computing (the introductory course for the CSS major) and Operating Systems (a senior-level core course in the major). I'm excited about teaching these courses for a number of reasons, not least of which is that these are the same courses I taught my first full-time semester teaching at the University of Hartford in 1985. Some content has changed, but many of the basic concepts have persisted over the intervening years. I'll be teaching courses on human-computer interaction, network design and web programming in the spring and winter quarters.

I don't anticipate much time for research during the next few quarters, as all of these courses will require new preparations on one or more dimensions. However, I do anticipate engaging some of my entrepreneurial energy. Although the Bothell campus is 20 years old, in the academic world this still qualifies as a "startup". The campus has ambitious growth plans to double in size over the next 5 years, and I'm looking forward to new opportunities for instigating, connecting and evangelizing in this new educational setting.

I also don't anticipate much time for blogging during this period; this post is already late (classes started last week), and I won't add much more to it. I do want to express my sincere gratitude for all the support I enjoyed from the faculty, staff and students at UW Tacoma throughout my initial re-engagement with academia last year. I am similarly grateful for the warm welcome I have received from the faculty, staff and students at UWB and CSS, and I look forward to my continuing education - as both a producer and a consumer - at the University of Washington.