Tuesday, July 13, 2010

Compiling Python modules on windows manually

So I wanted to use Pystemmer (http://snowball.tartarus.org/wrappers/PyStemmer-1.1.0.tar.gz not http://sourceforge.net/projects/pystemmer/ which seems to be out of date), but there is no included Visual Studio makefile or solution.

So what to do?

Below are the steps that I find work reasonably well (hopefully documented in a way that will allow other people to do the same for whatever library they are working with).

Note - I assume the following:
  • The code as provided will compile with Visual C (if Visual C cannot compile it - then your job is way harder!)
  • No extra tools (autoconf, etc) are required

So the steps are:
  1. Ensure directory with Python.h is in the %include% directories (menu item: Tools -> Options, Projects and Solutions -> VC++ Directories, Show directories for -> Include Files)
  2. Ensure directory with Python.lib is in the %lib% directories (as as steps for Include files - just select Library Files instead at the end)
  3. In Visual Studio 2008 (for Python 2.6) select menu item File -> New -> Project From Existing Code...
  4. Select the directory with the downloaded code as the Project File Location
  5. Give the project a name (generally a good idea to give it the name of the Python module - i.e. in my example "Stemmer" )
  6. Ensure that the Release build is selected (if using the toolbar) because generally on windows you will not have the debug library: python_d.lib.

  7. Modify the project properties:
    Set General -> Configuration Type to "Dynamic Library (.dll)"
    Set file extenstion for Linker -> General -> Output File to ".pyd"

  8. Try and compile - here are some of the common problems and possible solutions:

    • "multiply defined symbols"

      libstemmer_utf8.obj : error LNK2005: _sb_stemmer_length already defined in libstemmer.obj
      Release\Stemmer.pyd : fatal error LNK1169: one or more multiply defined symbols found

      The reason for this is that a symbol is defined in 2 source files, for PyStemmer this was because there is both libstemmer_utf8.c and libstemmer.c in the codebase which define the same symbols. The fix was to remove one from the project

    • "Cannot open include file"

      .\src\Stemmer.c(30) : fatal error C1083: Cannot open include file: 'libstemmer.h': No such file or directory

      This is a simple fix - find where the include file is and add it to the include path Project Properties dialog, C/C++ -> General -> Additional Include Directories

This is as much a reminder for me - but if this helps you great :)