Python 64bit on Windows
16 Jun 2013 20:59
Getting 64bit Python up and running with 64 bit packages on Windows is a bit of a pain.
Install WinPython 2.7.5.1 64bit
https://code.google.com/p/winpython/
Register the installation with windows with "WinPython Control Panel.exe" by going "Advanced" -> "Register distribution"
Now find a 64bit compilation of the package you want.
Christoph Gohlke provides a major public good by hosting precompiled packages at
http://www.lfd.uci.edu/~gohlke/pythonlibs/
In my case I needed opencv which meant downloading and installing "opencv-python-2.4.5.win-amd64-py2.7.exe"
Then in python import as usual "import cv"
Troubleshooting
If the package can't find your winpython installation then you might not have registered it as described above.
If your installation of Spyder stops working, you may have to reset your PATH environmental variables. You can view and edit environmental variables more easily with a freeware program like Rapid Environment Editor. In my case, I had to uninstall all of my python installations and packages before it would start working again.
If you receive the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: DLL load failed: %1 is not a valid Win32 application.
Then it probably means you have 64 bit python but a 32 bit version of your package. Either find a precompiled version that is listed as 64 bit or you'll have to compile it yourself.
Data from Historical Maps: Extracting Backgrounds
12 Jun 2013 20:11
As I've written on here before, digitizing political maps is no easy task. One tough problem is digitizing background colors which identify things like land cover.
Consider this section from a Vietnam War era military map of South Vietnam. There are three background regions, a dark green for forested area, a slightly lighter green for cleared forest, and a white area for completely clear. On top of that are lots of details including brown elevation lines, black grid lines and text, etc.
How do we differentiate the background regions from the foreground regions? Define a background region as a semi-contiguous area with similar but not identical color. Consider the following algorithm:
1) Use principal components analysis to cluster colors in the image.
2) Count the number of pixels belonging to each cluster.
3) Iterate through each cluster separately
a) Set all non-cluster pixels to black
b) Perform a median (or modal) filter
4) Count the number of pixels in each cluster which survived the filter
5) Calculate the percentage change from the original count to the new count.
6) Keep any cluster with a percentage change above an arbitrary cutoff point.
7) Create a mask for all pixels belonging to clusters which we rejected
8) Interpolate under the mask with those belonging to the clusters we kept.
The final product will be an image segmented solely into the main background regions as defined by the process above.
The process also allows us to extract the foreground which we can then further segment apart from the background.
Here is a draft of a function which implements the algorithm in python using the opencv library.
#Function to split background from foreground in map images
#Rex W. Douglass, 6/12/2013
#rexdouglass.com
#Imports
import numpy as np
import cv2
#########################################
#Function Define: forebacksplit
#Accepts a 3 channel image as a numpy array
#clusters - number of colors to posterize image into
#threshold - cutoff for how susceptible a cluster should be a median average with it's neighbors. Higher cutoff means fewer candidates for background regions.
#Returns a background image with holes interpolated, and a foreground region with holes set to black.
def forebacksplit(image, clusters=32, threshold=30):
original=image.copy()
#Cluster Colors
Z = original.reshape((-1,3)) #reshape into one big vector
Z = np.float32(Z) # convert to np.float32
# define criteria, number of clusters(K) and apply kmeans()
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 10)
ret,label,center = cv2.kmeans(Z,clusters,criteria,10,cv2.KMEANS_RANDOM_CENTERS)
center = np.uint8(center) # Now convert back into uint8, and make original image
res = center[label.flatten()]
original_clustered = res.reshape((original.shape))
#Determine level of contiguity of each cluster
percentchange= list()
for color in center:
condition = original_clustered !=color
temp = original_clustered.copy()
temp[condition]=0
median= cv2.medianBlur(temp, 11)
condition2 = median !=color
sumoriginal= sum(condition==False)
summedian= sum(condition2==False)
percentdiff= round ((summedian/sumoriginal)*100 )
print (sumoriginal,summedian,percentdiff)
percentchange.append(percentdiff)
#Select clusters above an arbitrary threshold
percentchange = np.asarray(percentchange)
surviving_clusters = center[percentchange>threshold]
#Split image based on surviving clusters
#create string representations of the two for matching
original_clustered_string = original_clustered.ravel().view( (np.str, original_clustered.itemsize*3) )
surviving_clusters_string = surviving_clusters.ravel().view((np.str, surviving_clusters.itemsize*3))
mask_logical= np.in1d(original_clustered_string,surviving_clusters_string)
background2d = mask_logical.reshape(original_clustered.shape[0:2]) #convert to a 2d mask, true for background
foreground2d = logical_not(background2d) #invert for just background, true for forground
foreground = original_clustered.copy()
foreground[background2d,:]=0
background = original_clustered.copy()
background[foreground2d,:]=0
#Now fill in background as solid
mask= array(foreground2d*255, dtype=uint8) #foreground mask as an image
original_clustered_inpaint = cv2.inpaint(original_clustered, mask, 5, cv2.INPAINT_TELEA ) #slow
background=original_clustered_inpaint
#Optional last median pass to despeckle
#background = cv2.medianBlur(original_clustered_inpaint, 9)
return(background,foreground) #returns a full background image and a full foreground image
################################
#Begin main code
original_bgr = cv2.imread('sample2.png', 1 ) #varied sample map
background,foreground = forebacksplit(original_bgr,64,70) #split with 64 colors and a threshold of 70
cv2.imshow('original',original_bgr)
cv2.imshow('background',background)
cv2.imshow('foreground',foreground)
Using Python 2.7 code from within R
18 Apr 2013 02:38
I've started to use Python functions called from R for some of my data preparation and analysis. It isn't obvious, however, how to do this on the latest version of R, with Python 2.7, and especially on a windows machine.
Step (1) Install rJython
install.packages("rJython")
library(rJython)
Step (2) Install the 2.7beta 1 version of Jython
http://www.jython.org/downloads.html
Step (3) set an environmental variable in R to point to the directory where the new Jython jar file lives. If you skip this step, rJython will default to an older version included in the package which doesn't handle Python 2.7.
#Set RJYTHON_JYTHON to the full path and name of the 2.7b1 jar file
Sys.setenv(RJYTHON_JYTHON="C:/jython2.7b1/jython.jar")
Now you should be able to call or execute Python 2.7 code from R as usual.
library(rJython)
rJython <- rJython()
rJython$exec( "a = 2*2" )
jython.assign( rJython, "a", a )
jython.exec( rJython, "b = len( a )" )
jython.get( rJython, "b" )
The Elevator Outline of a Dissertation
03 Dec 2012 18:19
Outlining a dissertation or book project is difficult because it's easy to get lost in the detail. I proposed the following outline to a colleague with the instruction to label each section separately and to adhere to the strict sentence limits.
- The Puzzle (1 Sentence)
- Which we should care about because (1 Sentence)
- What is the closest existing theory we have to account for this behavior (1 Sentence)
- Why does it get this wrong? (2 Sentences)
- Which together suggest the following Research Question (1 Sentence)
- Which we will explain by theorizing about the following outcomes (1 Sentence)
- Which is a product of a strategic interaction between who and who (1 Sentence)
- Of which I argue the following main factor explains the puzzle (1 Sentence)
- A factor which is important to understand, and has applicability to these bigger areas in political science (1 sentence)
- My theory produces the following observable implications (1 Sentence)
- My theory stands in contrast to the following explanations (1 Sentence Each)
- They generate alternative observable implications (1 Sentence Each)
- They suggest the coding of the following across a large sample of cases (1 Sentence)
- Of which the appropriate universe is (1 Sentence)
- Where my identification strategy is (2 Sentences)
- And this will be novel because the closest work has only done (1 Sentence)
- Additionally, a case by case comparison is warranted to code in greater detail the outcomes for the following macro level predictions (1 sentence) and for the following expected micro level predictions (1 sentence each)
- I have selected the following cases as representative from the relevant universe (1 Sentence)
- And where there is variation on the main outcome of interest (1 Sentence)
- And on the key explanatory factors (1 Sentence)
- Of which my theory predicts we should see across these cases (1 Sentence)
- And at the micro level we would expect to see (1 Sentence per case)
- The finding will make a major contribution to the following research field [the main one] (1 sentence)
- It will have the following important scope conditions (1 sentence)
- Which will suggest the next research questions (1 sentence)
Google Books and Microsoft OneNote for coding data
08 Aug 2011 21:03
An encouraging norm is emerging where scholars release alongside their data, a large pdf of textual summaries and specific quotes used in the coding decision. I've tinkered with different systems for doing this including word, excel, and access, but what I have recently discovered works best is surprisingly Microsoft OneNote. OneNote offers at least four advantages so far. The first, is that it makes it very easy to organize raw information by case and then variable using pages and subpages. Second, it makes it very easy to get information into OneNote from sources like google books. Use zotero to download the book citation automatically, and drag and drop it into OneNote. Then use OneNote's screen capture option to quickly copy and paste the relevant page(s) out of google books. Third, OneNote will automatically OCR those book images for you allowing you to either search for words later or to copy and paste the text directly into a word doc. Fourth, OneNote will export to a word doc or a pdf, splitting sections based on the case and variable headings you set up in your pages and subpages which allows you reorganize thing easily before putting out a final product.


