A while back we were fortunate to get on CBC’s All In A Day radio program to talk about our OTTrees open data project made with the city of Ottawa’s street tree data set. Unbeknownst to us that program was listened to by Richard Walker, the Director of Communications and Program Development at Tree Canada. Tree Canada is an organization that partners with private and public sectors to sustain urban and rural forests through tree planting programs.
Continuing on from the previous project, I was able to augment the functions that extract character names using NLTK’s named entity module and an example I found online, building my own custom stopwords list to run against the returned names to filter out frequently used words like “Come”, “Chapter”, and “Tell” which were caught by the named entity functions as potential characters but are in fact just terms in the story.
Work has continued on analyzing the mood and sentiment of a text by splitting the text into sentences using a regular expression that accounted for .!?’” sentence endings but wouldn’t split on common abbreviations like Mr., Mrs., and other titles. After I had a list of sentences I pulled in my proper nouns list, which turned out to be the most accurate list of potential character name matches, and compared the two lists with a regular expression that appended the sentence to a dictionary under the key of the character name if it was found in the sentence.
Recently I’ve been working on an extraction tool for analyzing a large amount of text to determine relevant and contextual noun phrases and proper nouns to return the sentiment of text surrounding these words. I’ve been using Python and a few natural language processing libraries and packages to help in this, specifically NLTK, NumPy, PyYAML, and nameparser. The natural language toolkit (NLTK) has an O’Reilly book available online that has been awesome in showing how to use the toolkit.