Monday, January 20, 2014

Book Review: Java Puzzlers

Book: Java Puzzlers

Authors: Joshua Bloch and Neal Gafter
Publisher: Addison Wesley
Pages Read: all
Sections: all
Thumbs up/Thumbs Down? Up, slightly sideways
Link: Amazon

Summary of Content Read

Java Puzzlers is not so much a book, but a collection of obscure corner cases in the Java programming language.  The author (Joshua Bloch) is well known as the author of "Effective Java" which is widely regarded as the premier text for the language, and furthermore he is one the designers and authors of the Java Collections Framework.  So to say the least, he knows his stuff.

Each chapter of the book features a collection of "puzzlers" centered around a particular section of the language (examples include loops, strings, exceptions, classes, etc).  Each "puzzler" is formulated where a puzzle (typically in the form of a code snippet) is given, and the reader is encouraged to try and predict what the output will be, or why the code is incorrect.  Then an answer/explanation of the puzzler is given.  All-in-all there are 95 different puzzlers across the book, and they range from the fairly common "if you thought about it a bit you'd figure it out" to the extremely obscure "unless you were a Java language designer you'd never have any hope of figuring this out".  The explanations also often include commentary to language designers (ex: "the lesson for language designers here is..."). 

From an academic "curiosity" point of view the book is quite intriguing.  As a fairly experienced Java developer I found myself surprised with the vast majority of the puzzlers.  The programming languages guy in me found this fascinating (ex: wait, so you can have Unicode literals in comments, and those literals are interpreted by the compiler?).

Having said that, the book does reach a point where the puzzles and concepts hit upon by the puzzles are extremely obscure.  For a typical Java developer you'll almost never run into most of the tidbits in this book.  That's not to say that reading it isn't useful, you'll definitely learn a bit about the book, but if you're looking to learn "how to write good Java code" this is not the book for you (again, see Bloch's other book for that).

Book Review - The Clean Coder

Book: The Clean Coder - A Code of Conduct For Professional Programmers

Author: "Uncle" Bob Martin
Publisher: Prentice Hall
Pages Read: all
Sections: all
Thumbs up/Thumbs Down?  Up
Linky: Amazon

Summary Of Content Read

This book is largely a follow-up to Martin's other very well known book "Clean Code".  Whereas that book focuses on the artifacts (code) we developers produce this book focuses on the developer his/herself.  How should we as professional developers act?  What is the difference between a commitment and estimate?  What are our responsibilities?  When can we say no & how do we do it?  When are we obligated to say yes?  How do we get better at what we do?

Martin tries to distill his nearly 40 years of experience into some hard fought lessons.  While it is very much appreciated to hear "tales from the trenches", the book does have a fairly heavy-handed "do as I say" tone.  Don't do TDD?  Well then you're not a professional.  Do you create ambitious estimates?  Well then, you're not a professional.  From a rhetorical point of view, the book does rely on this "proof by appeal to professionalism" approach, rather than give solid evidence and data to back up many of the arguments he makes.  For example, the TDD chapter has the passage:
Yes there have been lots of controversial blogs and articles written about TDD over the years and there still are.  In the early days they were serious attempts at critique and understanding.  Nowadays, however, they are just rants.  The bottom line is that TDD works, and everybody needs to get over it.
I feel like the paragraph should have ended with "QED".  Hardly a conclusive argument in favour of TDD, and the off-hand dismissal of any critiques of the practice really does hurt the point he's making.

Having said all this, it is certainly clear that much of what he offers is good advice, and represents an open challenge to developers to be better.  If you put aside the "if you don't do this you're not professional" rhetoric, at its core this book is a call for developers to live up to the responsibility of the job they have been hired to do.  Oftentimes we as developers like to silo ourselves off, focus on our narrowly defined technical tasks, and that is simply unrealistic.  Part of the responsibility of being a developer is to understand the context of the work you do, why it's important and why it adds value to the customer/client/business/etc.  And if that value isn't there, it's up to you to find it.

As such I found this book both refreshing and terrifying.  Refreshing to hear a voice from the agile community who doesn't seem to feel that the PO is the only entity responsible for identifying value. 
Terrifying to think that I, as an introverted software developer, has a duty to do more than just simply write good, clean code.

In terms of structure, the book is divided into 14 different chapters each covering a topic of interest to professional developers.  While there is some technical discussion, it is relatively rare, by in large the chapter topics focus on "soft" skills rather than technical ones.

All-in-all, while heavy-handed and at times "preachy", it is very much a worthwhile read for anyone considering or living a career in software development.

Friday, November 1, 2013

EqualsVerifier

This looks more than a little cool for those of us (like me) who are pedantic about testing out equals/hashcode/compareTo methods:

http://www.jqno.nl/equalsverifier/

Friday, August 3, 2012

Git bisect And Nose -- Or how to find out who to blame for breaking the build.

How did I not ever discover git bisect before today?  Git bisect allows you to identify a particular commit which breaks a build, even after development has continued past that commit.  So for example, say you:
  • Commit some code which (unknowing to you) happens to break the build
  • You then (not realizing things have gone sideways) continue on doing commits on stuff you're working on
  • You then are about to push your code up to a remote master, so you finally run all those unit tests and realize you broke the build somewhere, but you don't know which commit introduced the problem
In a typical environment you'd now have a fun period of checking out a previous revision, running the tests, seeing if that was the commit that broke the build, and continue doing so until you identified the commit that introduced the failure.  I have experienced this many many times and it is the complete opposite of fun.

If you were smart you might recognize that a binary search would be effective here.  That is, if you know commit (A) is bad, and commit (B) is good, and there's 10 commits in-between (A) and (B) then you'd checkout the one halfway between the two, check for the failure, and in doing so eliminate half the possibilities (rather than trying all 10 in succession).

And if you were really smart you'd know that this is exactly what git bisect does.  You tell git bisect which commit you know is good, and which commit you know is bad, then it steps you through the process of stepping through the commits in-between to identify which commit introduced the failure.

But wait, there's more!  There's also a lesser-known option to git bisect.  If you do a "git bisect run <somecommand>" then the process becomes completely automated.  What happens is git runs <somecommand> at each iteration of the bisection, and if the command returns error code 0 it marks that commit as "good", and if it returns non-zero it marks it as "bad", and then continues the search with no human interaction whatsoever.

How cool is that?

So then the trick becomes "what's the command to use for <somecommand>?"  Obviously this is project dependent (probably whatever command you use to run your unit tests), but for those of us who are sane Python devs we probably use Nose to run our tests.  As an example, I often organize my code as follows:

project/
     +--- src/
             +--- module1/
             +--- module2/
             +--- test/

Where "module1" contains code for a module, "module2" contains code for another module, and "test" contains my unit tests.  Nose is smart enough that if you tell it to start at "src" it will search all subdirectories for tests and then run them.  So lets say we know that commit 022ca08 was "bad" (ie the first commit we noticed the problem in) and commit "0b52f0c" was good (it doesn't contain the problem).  We could then do:

git bisect start 022ca08 0b52f0c --
git bisect run nosetests -w src

Then go grab a coffee, come back in a few minutes (assuming your tests don't take forever to run), and git will have identified the commit between 0b52f0c and 022ca08 that introduced the failure.  Note that we have to run git bisect from the top of the source tree (in my example the "project" directory) hence we need to tell nosetests to look in src via the -w parameter.

Thursday, June 7, 2012

Handy Python tip #1

The other day I was adding the rich comparison methods (the ones for operator overloading) to a class I had defined.  Like many Python programmers before me I wondered "why is it that if I define a method for equality, I still have to define a not-equal method?" and "if I define a comparison method, why do I have to define the other comparison methods?"

And then low and behold, while looking for something completely different, I stumbled across the functools.total_ordering class decorator.  With it, you can define just the __eq__ method, and any rich comparison method (__le__, __lt__, __gt__, etc), and it provides default implementations for all the others.

Very handy stuff.

An example can be found in my MiscPython examples collection on Github.

Saturday, June 2, 2012

The Polyglot {UN}Conference 2012

This year I was fortunate to be allowed to attend the inaugural Polyglot UN-Conference.  An UN-conference is a unique format that is rather well suited to coding topics whereby attendees suggest and facilitate fairly open forums on whatever they want to talk or hear about.  It's a very cool idea that has the potential to be completely awful or absolutely amazing.

I can say with full confidence that Polyglot was very much the latter.  Simply a great event all around.

I managed to get in on five of the talks at the show this year.  I'll do a quit recap of each in turn:

Go

First up was the Go programming language.  The session started with the facilitator giving a quick bird's eye view of the language and some of the interesting features that make it unique, and then led into a group discussion of various thoughts & experiences others had with it.  Honestly before I had showed up to Polyglot I had kinda dismissed Go as a toy language from Google, but while I never had any "aha!" moments during the session, I definitely had my curiosity piqued.  Some things I never knew:
  1. it's a compiled (statically compiled) language, not interpreted
  2. syntactically it's a blend of a C/Java style language with what looks an awful lot like Python
  3. Ken Thompson (who was one of the co-inventors of a little language called C) was one of the initial visionaries for the project.  Interesting stuff.
  4. Its statically typed, though type declarations are optional (it seems to do some sort of type inference)
  5. There's no classes and inheritance, instead uses interfaces and composition
  6. There's a rather substantial standard library.  It's not PyPI, but there's a definite sense of "batteries included".
I'll definitely be playing around with it a bit, as I want to know more.

Attracting Developers to Your Platform

A common problem many devs who have open source projects face is how to "inspire" other devs to:
  • get excited about their project
  • get others to contribute/spread the word.  
Much of the focus of this open session was things we (as owners of projects) should and should not do to facilitate these goals.  Some topics touched on was how to manage poisonous people, zealots, etc, how to promote your project via things like talks at conferences, the importance of online presence, creating a sense for developers that support is visible, responsive, and accessible, and a variety of others.  Unfortunately I don't have any notes from the talk, as my computer's battery was dead during it. :(  

While much of the conversation was interesting from an academic standpoint, as someone who doesn't have any FOSS projects to get people jazzed about, there wasn't really a lot of takeaway for me here.  This was I think the problem with it -- it felt too focused on open source.

After an extended lunch (thanks to the extremely slow service at the Old Spaghetti Factory), we got back to the conference about halfway through the 1PM talks, so I never really got to anything here, instead taking the time to charge the battery on my netbook & decompress a bit.  At 2PM though I got to:

Effective Testing Practices

This one was the highlight of the day for me.  The fishbowl session started with an open discussion on acceptance testing vs user testing, and went from there.  One of the big takeaways for me was Cucumber which I had never seen before but seemed worth exploring.  There was much debate on the use of systems like this that try to capture business requirements in a semi-structured format.  Some feel that this had value, others not so much.  Much insightful spirited debate ensued -- until the fire alarm went off and we all had to leave for a bit.  Flame war indeed.

When we got back, an insightful discussion largely surrounding the notion of test coverage ensued.  Some feel that the artificial number that per-line test coverage gives has the potential for misleading one into a false sense of security.  Others (and I'd say I'm sympathetic to this view) feel that while sure the number is completely meaningless, it provides a quick and dirty metric for identifying gross shortcomings in your testing.

There were also some rather humourous "horror stories" about testing (or a lack thereof) in industry, and a few comments that started to really touch on the deep issue of why we test, and what the point of it all is.  It's too bad this session lost 10-15 minutes due to the fire alarm, as this one was the highlight of the conference for me.

Big Data

I was lukewarm on this one going in, but none of the other topics at the time really caught my eye.  The open discussion started with the facilitator soliciting people in the audience to share their experiences with big data.  Most of these were actually fairly small, anecdotal discussions about the difficulties of working with larger amounts of data with traditional RDBMS systems.  Partway through an attendee (who is an employee of Amazon) chimed in and gave an intro on some of the concepts behind true big data (ie Amazon S3) systems.  This was good and bad, while it was great to see someone with expert knowledge step in and share his insights, it did feel as though the talk moved from "how can we do big data, what are the challenges associated with it" to "if you need to do big data, you can use Amazon S3 for the backend".

R and Python

I'm not sure if it was the "end of day and I'm exhausted" factor, or just my lack of interest in scientific computing, but I pretty much tuned out during this one.  It started off with a demonstration of using iPython Notebook to explore some data set correlating weather with bicycle ridership.  On one hand, the technology seemed useful, particularly for those who have Matlab/Mathmatica backgrounds, but for me, I lost interest early.  Two of my coworkers however found it quite interesting.

Last were the closing ceremonies, with a fun and entertaining demonstration of coding by voice in 5 different languages in ~5 minutes.  This was priceless. :)

On the whole, for being the first one, the conference was quite well run.  Some things I'd have liked to see would've been to have the online schedule be a bit more accessible.  It was a bit of a hassle to go to Lanyrd, track down the conference, and hit schedule.  And related to this: the online schedule was out of sync with the printed board, while we were at lunch we couldn't find out what the talks happening at 1PM were as a result.  Having the online board kept in sync with the printed board would've been very useful.

Minor hiccups aside, the conference was amazing.  It was incredible value too -- $35 for a days worth of tech talks with people who know, and love technology and use it to solve problems on a daily basis.  Schedule permitting I have no doubt I'd attend again in the future.

An interesting idea that was mentioned at the closing ceremonies was to do Vancouver Polyglot meetups every so often.  While I likely won't be able to attend these as I live in Victoria, I really hope this takes hold as it'd be awesome to see the strong tech community in greater Vancouver grow.

Friday, May 18, 2012

Useful Python Tools

I often stumble across and use a number of useful tools for creating Python code.  Thought I'd barf out a blog post documenting a few of them so that my future self will be able to find this info again if need be. :)

coverage.py 

(http://nedbatchelder.com/code/coverage/)

Coverage.py is a Python code coverage tool and is useful for finding out how well your unit tests cover your code.  I've often had it find big deficiencies in my unit test coverage.  Common usage:

$ coverage run somemodule_test.py
$ coverage report -m

Will spit out a coverage report for the tests in somemodule_test.py.  Used in this way, coverage.py isn't particularly handy, but combined with a good unit test runner (see below) it becomes very handy.

Nose 

(http://readthedocs.org/docs/nose/en/latest/)

Is nicer testing for Python.  Nose is an extremely handy unittest runner that has some perks over the standard Python unittest module.  Continuing from the last tool, nose also integrates very nicely with coverage.py.  I commonly use it to produce some nice HTML pages summarzing test coverage for my project:

$ nosetests --with-coverage --cover-inclusive --cover-html --cover-erase

produces a "cover" directory containing an index.html with some nice pretty HTML reports telling me how well my unit tests cover my codebase.

pymetrics 

(http://sourceforge.net/projects/pymetrics/)

pymetrics is a handy tool for spitting out some well, metrics, about your code.  Ex:

$ pymetrics somemodule.py

Spits out a bunch of numbers about somemodule.py including trivial things like how many methods have docstrings, to more interesting things like the McCabe cyclomatic complexity of each method/function within the module.  Handy.

cloc 

(http://cloc.sourceforge.net/)

Is a simple "lines of code" counter that happens to support Python.  In the top directory of a project a:

$ cloc .

will give you summary output for your project like:

-------------------------------------------------------------
Language  files          blank       comment           code
-------------------------------------------------------------
Python       31           3454          9215          14775
-------------------------------------------------------------
SUM:         31           3454          9215          14775
-------------------------------------------------------------

While LOC is generally a meaningless statistic, it can be handy for getting a "ballpark" idea of how big a project is.