From 153291e59d8eb20ee2e18ddbb3f4809214635e6f Mon Sep 17 00:00:00 2001
From: Tyler Hallada <tyler@hallada.net>
Date: Wed, 3 May 2017 01:16:30 -0400
Subject: [PATCH] Added rhyming and syllable counting slides

---
 edX Lightning Talk.ipynb | 227 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 217 insertions(+), 10 deletions(-)

diff --git a/edX Lightning Talk.ipynb b/edX Lightning Talk.ipynb
index 0a9ed90..75bd261 100644
--- a/edX Lightning Talk.ipynb	
+++ b/edX Lightning Talk.ipynb	
@@ -101,7 +101,7 @@
     }
    },
    "source": [
-    "To create **bigrams**, iterate through the list of words with two indicies, one of which is offset by one:"
+    "To create **bigrams**, iterate through the list of words with two indices, one of which is offset by one:"
    ]
   },
   {
@@ -454,11 +454,9 @@
    "source": [
     "## Whole sentences can be the conditions and values too ##\n",
     "\n",
-    "Which is basically the way cleverbot works:\n",
+    "Which is basically the way cleverbot works ([http://www.cleverbot.com/](http://www.cleverbot.com/)):\n",
     "\n",
-    "![Cleverbot](images/cleverbot.png)\n",
-    "\n",
-    "[http://www.cleverbot.com/](http://www.cleverbot.com/)"
+    "![Cleverbot](images/cleverbot.png)"
    ]
   },
   {
@@ -474,7 +472,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 31,
+   "execution_count": 10,
    "metadata": {
     "slideshow": {
      "slide_type": "fragment"
@@ -520,7 +518,7 @@
    "source": [
     "## Random poems ##\n",
     "\n",
-    "Generating random poems is simply limiting the choice of the next word by some constraint:\n",
+    "Generating random poems is accomplished by limiting the choice of the next word by some constraint:\n",
     "\n",
     "* words that rhyme with the previous line\n",
     "* words that match a certain syllable count\n",
@@ -528,6 +526,204 @@
     "* etc."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "# Rhyming\n",
+    "\n",
+    "**Written English != Spoken English**\n",
+    "\n",
+    "English is highly **nonphonemic**, meaning that the letters often have no correspondence to the pronunciation. E.g.:\n",
+    "\n",
+    "\n",
+    "\"meet\" vs. \"meat\"\n",
+    "\n",
+    "The vowels are spelled differently, yet they rhyme."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "fragment"
+    }
+   },
+   "source": [
+    "Fun fact: They used to be pronounced differently in Middle English during the invention of the printing press and standardized spelling."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "# International Phonetic Alphabet (IPA)\n",
+    "\n",
+    "An alphabet that can represent all varieties of human pronunciation.\n",
+    "\n",
+    "* meet: /mit/\n",
+    "* meat: /mit/"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "fragment"
+    }
+   },
+   "source": [
+    "Note: this is only the IPA transcription for only one **accent** of English."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "# Syllables\n",
+    "\n",
+    "* \"poet\" = 2 syllables\n",
+    "* \"does\" = 1 syllable\n",
+    "\n",
+    "Can the IPA tell us the number of syllables in a word too?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "fragment"
+    }
+   },
+   "source": [
+    "* poet: /ˈpoʊət/\n",
+    "* does: /ˈdʌz/\n",
+    "\n",
+    "Not really... We cannot easily identify three syllables from that transcription.\n",
+    "\n",
+    "Sometimes the transcriber denotes syllable breaks (with a `.` or a `'`), but sometimes they don't."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "# Arpabet\n",
+    "\n",
+    "A phonetic alphabet developed by ARPA in the 70s that:\n",
+    "\n",
+    "* Encodes phonemes specific to American English.\n",
+    "* Meant to be a machine readable code. It is ASCII only.\n",
+    "* Denotes how stressed every vowel is from 0-2."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "fragment"
+    }
+   },
+   "source": [
+    "This is perfect! Word's syllable count equals the number of digits in the Arpabet encoding."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "# CMU Pronouncing Dictionary (CMUdict)\n",
+    "\n",
+    "A large open source dictionary of English words to North American pronunciations in Arpanet encoding."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "fragment"
+    }
+   },
+   "source": [
+    "Conveniently, it is also in NLTK..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "# Counting Syllables"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {
+    "collapsed": true,
+    "slideshow": {
+     "slide_type": "fragment"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import string\n",
+    "from nltk.corpus import cmudict\n",
+    "cmu = cmudict.dict()\n",
+    "\n",
+    "def count_syllables(word):\n",
+    "    lower_word = word.lower()\n",
+    "    if lower_word in cmu:\n",
+    "        return max([len([y for y in x if y[-1] in string.digits])\n",
+    "                    for x in cmu[lower_word]])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {
+    "slideshow": {
+     "slide_type": "fragment"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "poet: 2\n",
+      "does: 1\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(\"poet: {}\\ndoes: {}\".format(count_syllables(\"poet\"),\n",
+    "                                  count_syllables(\"does\")))"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {
@@ -585,7 +781,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 33,
+   "execution_count": 13,
    "metadata": {
     "slideshow": {
      "slide_type": "fragment"
@@ -613,9 +809,20 @@
     "print parsed"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "## NLTK Syntax Trees! ##"
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": 34,
+   "execution_count": 14,
    "metadata": {
     "slideshow": {
      "slide_type": "fragment"
@@ -660,7 +867,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 30,
+   "execution_count": 15,
    "metadata": {
     "slideshow": {
      "slide_type": "fragment"