Using Python 2.7 code from within R
18 Apr 2013 02:38
I've started to use Python functions called from R for some of my data preparation and analysis. It isn't obvious, however, how to do this on the latest version of R, with Python 2.7, and especially on a windows machine.
Step (1) Install rJython
install.packages("rJython")
library(rJython)
Step (2) Install the 2.7beta 1 version of Jython
http://www.jython.org/downloads.html
Step (3) set an environmental variable in R to point to the directory where the new Jython jar file lives. If you skip this step, rJython will default to an older version included in the package which doesn't handle Python 2.7.
#Set RJYTHON_JYTHON to the full path and name of the 2.7b1 jar file
Sys.setenv(RJYTHON_JYTHON="C:/jython2.7b1/jython.jar")
Now you should be able to call or execute Python 2.7 code from R as usual.
library(rJython)
rJython <- rJython()
rJython$exec( "a = 2*2" )
jython.assign( rJython, "a", a )
jython.exec( rJython, "b = len( a )" )
jython.get( rJython, "b" )
The Elevator Outline of a Dissertation
03 Dec 2012 18:19
Outlining a dissertation or book project is difficult because it's easy to get lost in the detail. I proposed the following outline to a colleague with the instruction to label each section separately and to adhere to the strict sentence limits.
- The Puzzle (1 Sentence)
- Which we should care about because (1 Sentence)
- What is the closest existing theory we have to account for this behavior (1 Sentence)
- Why does it get this wrong? (2 Sentences)
- Which together suggest the following Research Question (1 Sentence)
- Which we will explain by theorizing about the following outcomes (1 Sentence)
- Which is a product of a strategic interaction between who and who (1 Sentence)
- Of which I argue the following main factor explains the puzzle (1 Sentence)
- A factor which is important to understand, and has applicability to these bigger areas in political science (1 sentence)
- My theory produces the following observable implications (1 Sentence)
- My theory stands in contrast to the following explanations (1 Sentence Each)
- They generate alternative observable implications (1 Sentence Each)
- They suggest the coding of the following across a large sample of cases (1 Sentence)
- Of which the appropriate universe is (1 Sentence)
- Where my identification strategy is (2 Sentences)
- And this will be novel because the closest work has only done (1 Sentence)
- Additionally, a case by case comparison is warranted to code in greater detail the outcomes for the following macro level predictions (1 sentence) and for the following expected micro level predictions (1 sentence each)
- I have selected the following cases as representative from the relevant universe (1 Sentence)
- And where there is variation on the main outcome of interest (1 Sentence)
- And on the key explanatory factors (1 Sentence)
- Of which my theory predicts we should see across these cases (1 Sentence)
- And at the micro level we would expect to see (1 Sentence per case)
- The finding will make a major contribution to the following research field [the main one] (1 sentence)
- It will have the following important scope conditions (1 sentence)
- Which will suggest the next research questions (1 sentence)
Google Books and Microsoft OneNote for coding data
08 Aug 2011 21:03
An encouraging norm is emerging where scholars release alongside their data, a large pdf of textual summaries and specific quotes used in the coding decision. I've tinkered with different systems for doing this including word, excel, and access, but what I have recently discovered works best is surprisingly Microsoft OneNote. OneNote offers at least four advantages so far. The first, is that it makes it very easy to organize raw information by case and then variable using pages and subpages. Second, it makes it very easy to get information into OneNote from sources like google books. Use zotero to download the book citation automatically, and drag and drop it into OneNote. Then use OneNote's screen capture option to quickly copy and paste the relevant page(s) out of google books. Third, OneNote will automatically OCR those book images for you allowing you to either search for words later or to copy and paste the text directly into a word doc. Fourth, OneNote will export to a word doc or a pdf, splitting sections based on the case and variable headings you set up in your pages and subpages which allows you reorganize thing easily before putting out a final product.
Archival Research: Custom Zotero Translators
29 Jun 2011 16:37
With more and more archival material being put up on the web, it is important to have a system for downloading and organizing that material for your research. I use zotero for all of my citation management because 1) it automatically pulls cites and files from the web and 2) it can store them to the cloud so they follow you wherever you go. For specialized electronic archives, however, there may not be a ready made zotero translator available. This was the case for the amazing Vietnam Virtual Archive at Texas Tech, so I rolled my own translator using the directions at the links provided below. I've made the full code available for anyone in need of a quick fix now, and I'll put together something more substantial for the main zotero trunk when I get time.
http://niche-canada.org/member-projects/zotero-guide/chapter1.html
http://www.zotero.org/support/dev/how_to_write_a_zotero_translator_plusplus
http://www.zotero.org/support/dev/translators
//Zotero Translator for The Virtual Vietnam Archive at Texas Tech
//Rex W. Douglass
//6-29-2011
/*
Installation: Go to http://www.zotero.org/support/dev/translators/scaffold and install a firefox plugin called scaffold.
Run scaffold by going to the firefox button in the top left=>Scaffold
Make sure you are on the Metadata tab.
Enter the following information under each field
Label: VirtualVietnamArchive
Creator: Rex W. Douglass
Target: http://www.virtualarchive.vietnam.ttu.edu/starweb/virtual/vva/servlet.starweb
Then flip to the "Code" tab and paste the contents of this code.
Save by clicking the "Save" button, second from the left. Close scaffold and restart firefox.
*/
/*Use: Go to http://www.virtualarchive.vietnam.ttu.edu/starweb/virtual/vva/servlet.starweb?path=virtual/vva/virtual.web
Enter a search in the search terms box and hit search
For any of the results hit "more information"
Then to save the citation hit the zotero single page icon at the far right of the URL bar just before the bookmark star.
It will download the citation to your zotero library as well as all of the PDFs that are attached.
*/
function detectWeb(doc, url) {
return "single";
}
function doWeb(doc, url) {
var namespace = doc.documentElement.namespaceURI;
var nsResolver = namespace ? function(prefix) {
if (prefix == "x" ) return namespace; else return null;
} : null;
var newItem = new Zotero.Item("document");
var TitlePath = '//div[2]/div[2]/div[2]/div[2]/div';
var Title = doc.evaluate(TitlePath, doc, nsResolver, XPathResult.ANY_TYPE, null).iterateNext().textContent;
Zotero.debug(Title);
newItem.title = Title;
var DatePath = '//div[2]/div[2]/div[2]/div[5]/div';
var Date = doc.evaluate(DatePath, doc, nsResolver, XPathResult.ANY_TYPE, null).iterateNext().textContent;
Zotero.debug(Date);
newItem.date = Date;
var LocationPath = '//div[2]/div[2]/div[2]/div[8]/div';
var Location = doc.evaluate(LocationPath, doc, nsResolver, XPathResult.ANY_TYPE, null).iterateNext().textContent;
Zotero.debug(Location);
newItem.archiveLocation = Location;
var CollectionPath = '//div[2]/div[2]/div[2]/div[6]/div/a';
var Collection = doc.evaluate(CollectionPath, doc, nsResolver, XPathResult.ANY_TYPE, null).iterateNext().textContent;
Zotero.debug(Collection);
newItem.archiveLocation = Collection;
var ItemNumPath = '//div[2]/div[2]/div[2]/div[1]/span[1]';
var RecordNumPath = '//div[2]/div[2]/div[2]/div[1]/span[2]';
var ItemNum = doc.evaluate(ItemNumPath, doc, nsResolver, XPathResult.ANY_TYPE, null).iterateNext().textContent;
var RecordNum = doc.evaluate(RecordNumPath, doc, nsResolver, XPathResult.ANY_TYPE, null).iterateNext().textContent;
newItem.extra = ItemNum + " " + RecordNum ;
var PDFPath = '//div[2]/div[2]/div[2]/div[10]/div/div/a';
var PDFOBJECT = doc.evaluate(PDFPath, doc, nsResolver, XPathResult.ANY_TYPE, null)
var counter = doc.evaluate('count (' + PDFPath + ')', doc, nsResolver, XPathResult.ANY_TYPE, null);
Zotero.debug(counter.numberValue);
Zotero.debug(ItemNum);
Zotero.debug(RecordNum);
var headers;
var count=1
while (headers = PDFOBJECT.iterateNext()) {
var PDFURL=headers.href
newItem.attachments.push({
title: Title + " (" + count + " of" + counter.numberValue + ")",
mimeType:"application/pdf",
url:PDFURL});
count=count+1;
}
newItem.complete();
};
GIS: Vectorizing Political Boundaries from Complicated Maps
17 Jun 2011 19:03
Anyone who has tried to vectorize a paper map has struggled with the fact that maps are not designed to be cleanly read by computers, they are designed to cram as much information as possible in the smallest space that the human eye can interpret. Using a freeware image tool called GIMP, I have a quick and dirty way of removing the clutter and leaving only the political boundaries for vectorization.
Take for example this political boundary map from the Vietnam War (click to zoom, warning big download at 12 meg).

We just want the country boundaries, both between countries and between land and water. Unfortunately, the map is crammed with political boundaries, rivers, text, etc.

You could physically trace the outside perimeter for every country, but that would be very time consuming. You could try just selecting the light blue for the ocean boundary and the black/yellow between the countries, but it will pick up a lot more than just the boundaries and the lines will be full of holes thanks to all of the text labels that crisscross the boundaries.
A good alternative is to load the picture up in a program like GIMP, and use the fill tool to color each country and the ocean a distinguishing color. Don't worry about missing small bits, get the large parts filled in and pay particular attention to the border areas. If there are gaps in the borders, use the pencil tool to quickly plug the holes, and continue filling.

Now use the color select tool, while holding down shift, to select all of the areas that are filled in. Create a new layer, and paste that selection to the new layer. What you will have will look like ugly swiss cheese from all the holes left by text, map features, and places you missed filling in.

Now go to filters, and select the despeckle filter. GIMP will make an educated guess about what color should be in the holes base on the nearby colors. Play with the settings until you find a balance that works for you, and run the filter as many times as necessary to plug all the remaining holes.

The final product will have a margin of error along the perimeter, but in many cases will be within tolerance and can be improved upon by more careful filling/despeckling. Most importantly, it can be done in a fraction of the time of other methods.