IBM SPSS Modeler supports Python scripting using Jython, a Java[tm] implementation of the Python language. Modeler versions 16 and 17 use Jython 2.5.1 which includes a number of useful and popular modules. However, many other modules are available and customers often want to use their own so a frequent question is how to include them.
There are two approaches:
- Copy the module to the “site-packages” folder under modeler-installation/lib/jython/Lib. This has the advantage of making modules available to anybody who uses that Modeler installation but usually requires someone with a level of administrative privileges to update the Modeler installation.
- Define a
JYTHONPATH
environment variable and add folders containing the modules to that. This allows each user to have their own module search paths without affecting other users and also does not require administrative privileges. However, it does mean that a script will only work for users who have the correctJYTHONPATH
set.
We will add the BeautifulSoup HTML parser to the search path using each method. Note that we need to use BeautifulSoup 3 to be compatible with Jython 2.5.x (later versions of BeautifulSoup require Python 2.7 and 3.x). You can download the correct version from here:
http://www.crummy.com/software/BeautifulSoup/download/3.x/BeautifulSoup-3.2.1.tar.gz
You can also check the BeautifulSoup 3 documentation.
Setup
Use your favourite archiving tool to extract the BeautifulSoup-3.2.1 folder containing so you can see:
BeautifulSoup.py BeautifulSoupTests.py PKG-INFO setup.py
Using “site-packages”
- Locate the folder modeler-installation/lib/jython/Lib/site-packages.
- Copy the BeautifulSoup.py file from the BeautifulSoup-3.2.1 folder you extracted earlier to the site-packages folder (you may be prompted to authorise the copy).
Using JYTHONPATH
- Define a
JYTHONPATH
environment variable. On Windows 7:- Open the Control Panel
- Click the “System” option
- Click the “Advanced System Settings” on the left.
- In the dialog you should see the Advanced tab is selected and this should have an “Environment Variables…” button at the bottom. Click on that.
- In the Environment Variables dialog, click the “New…” button to create the
JYTHONPATH
environment and provide the full path to the BeautifulSoup-3.2.1 folder e.g. C:\Users\jclinton\Downloads\BeautifulSoup-3.2.1.
- Restart Modeler.
Testing
To confirm the installation has worked, we will run a simple script that uses BeautifulSoup and the urllib2 modules to look for links on a web page (it’s not a realistic example but you could use something more complex to scrape web pages for values that that could be fed, say, to text analytics).
Create a new stream, open the stream dialog at the Execution tab and copy the following script to it.
import sys
import urllib2
from BeautifulSoup import BeautifulSoup
try:
page = urllib2.urlopen("http://www.metoffice.gov.uk/public/weather/forecast")
soup = BeautifulSoup(page)
for link in soup('a'):
print(link.get('href'))
except urllib2.URLError, e:
print "URLError:", e.reason
except:
print "Unexpected error:", sys.exc_info()[0]
Switch to the Debug sub-tab and when you run the script, you should see the values of the “href” attributes in each HTML “a” (link) element.
This post was originally published here.