For storing and querying the contract and grants information from the US government (using http://usaspending.gov/ ), we store their native XML results in an XML database.
To install, we follow quite closely this useful tutorial . Here I will give the basic outline.
If you are on Ubuntu, simply follow the instructions as they will work.
If you’re on Debian, it’s a bit more difficult, as we can’t use the pre-built packages…
We need to install a few different packages for Debian unstable:
# Need to install some ICU libraries and headers in order to get the configure process to work
# sudo apt-get install libicu40 libicu-dev
# sudo apt-get install libdb4.6++
# sudo apt-get install libdb4.6++-dev
Download XQilla ver 2.1.2 and follow the instructions. You will also have to download, patch, build, and install xerces 2.8.0 beforehand. To install xerces, after building it, run
# sudo make XERCESCROOT=$XERCESCROOT install
Then build and install XQilla, following their instructions.
Download the dbxml tarball and place it somewhere:
# wget http://download.oracle.com/berkeley-db/dbxml-2.4.13.tar.gz
Then follow the instructions in “Compile DB XML”. If you are on debian, you have to change the configure command to point to the xerces and xqilla libraries we just installed in /usr/local:
# CFLAGS="-DSWIG_PYTHON_NO_USE_GIL" ../dist/configure --with-berkeleydb=/usr/lib/ --with-xqilla=/usr/local/lib/ --with-xerces=/usr/local
Go away for a while, as this will take some time. Make sure that you can start the dbxml program.
We need to install the Redland RDF library and the associated python bindings:
# sudo apt-get install librdf0 redland-utils python-librdf
This is used in the part of the server code for storing and recalling Trustee information from the triple store, eventually so that we can start mapping interesting relationships between Trustees, schools, and corporations.
From here we need pdftotext, so install it using apt-get:
# sudo apt-get install xpdf-utils
Make sure setuptools is installed:
# apt-get install python-setuptools
Current versions of python (2.5) come with their own version of the Berkeley DB bindings in bsddb. Unfortunately, for some reason these bindings didn’t work on the server. So we had to install the bindings directly . This is easily accomplished using easy_install:
# easy_install bsddb
flup is the WSGI backend we use for lighttpd deployment. Install using setuptools:
# sudo easy_install flup
We use web.py , a minimal web framework written in python. Download web.py and install it using your favorite method.
Download Beautiful Soup and install it.
To parse the Google RSS feed, we use feedparser . Download and install.
We use a couple of different external libraries in order to generate Atom and RSS feeds. Download and install atomixlib and PyRSS2Gen . The both build and install using the standard pythonic way:
# python setup.py build
# python setup.py install
To modify the default web.py handling of log files, we use wsgilog:
# sudo easy_install wsgilog
For making pretty text on the webpages we install textile and smartypants using easy_install:
# sudo easy_install textile
# sudo easy_install smartypants