Added rhyming and syllable counting slides

2017-05-03 01:16:30 -04:00 · 2017-05-03 01:16:30 -04:00 · 153291e59d
commit 153291e59d
parent b3145c9754
1 changed files with 217 additions and 10 deletions
--- a/Talk.ipynb
+++ b/Talk.ipynb
@ -101,7 +101,7 @@
    }
   },
   "source": [
-    "To create **bigrams**, iterate through the list of words with two indicies, one of which is offset by one:"
+    "To create **bigrams**, iterate through the list of words with two indices, one of which is offset by one:"
   ]
  },
  {
@ -454,11 +454,9 @@
   "source": [
    "## Whole sentences can be the conditions and values too ##\n",
    "\n",
-    "Which is basically the way cleverbot works:\n",
+    "Which is basically the way cleverbot works ([http://www.cleverbot.com/](http://www.cleverbot.com/)):\n",
    "\n",
-    "![Cleverbot](images/cleverbot.png)\n",
-    "\n",
-    "[http://www.cleverbot.com/](http://www.cleverbot.com/)"
+    "![Cleverbot](images/cleverbot.png)"
   ]
  },
  {
@ -474,7 +472,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 31,
+   "execution_count": 10,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
@ -520,7 +518,7 @@
   "source": [
    "## Random poems ##\n",
    "\n",
-    "Generating random poems is simply limiting the choice of the next word by some constraint:\n",
+    "Generating random poems is accomplished by limiting the choice of the next word by some constraint:\n",
    "\n",
    "* words that rhyme with the previous line\n",
    "* words that match a certain syllable count\n",
@ -528,6 +526,204 @@
    "* etc."
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "# Rhyming\n",
+    "\n",
+    "**Written English != Spoken English**\n",
+    "\n",
+    "English is highly **nonphonemic**, meaning that the letters often have no correspondence to the pronunciation. E.g.:\n",
+    "\n",
+    "\n",
+    "\"meet\" vs. \"meat\"\n",
+    "\n",
+    "The vowels are spelled differently, yet they rhyme."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "fragment"
+    }
+   },
+   "source": [
+    "Fun fact: They used to be pronounced differently in Middle English during the invention of the printing press and standardized spelling."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "# International Phonetic Alphabet (IPA)\n",
+    "\n",
+    "An alphabet that can represent all varieties of human pronunciation.\n",
+    "\n",
+    "* meet: /mit/\n",
+    "* meat: /mit/"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "fragment"
+    }
+   },
+   "source": [
+    "Note: this is only the IPA transcription for only one **accent** of English."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "# Syllables\n",
+    "\n",
+    "* \"poet\" = 2 syllables\n",
+    "* \"does\" = 1 syllable\n",
+    "\n",
+    "Can the IPA tell us the number of syllables in a word too?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "fragment"
+    }
+   },
+   "source": [
+    "* poet: /ˈpoʊət/\n",
+    "* does: /ˈdʌz/\n",
+    "\n",
+    "Not really... We cannot easily identify three syllables from that transcription.\n",
+    "\n",
+    "Sometimes the transcriber denotes syllable breaks (with a `.` or a `'`), but sometimes they don't."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "# Arpabet\n",
+    "\n",
+    "A phonetic alphabet developed by ARPA in the 70s that:\n",
+    "\n",
+    "* Encodes phonemes specific to American English.\n",
+    "* Meant to be a machine readable code. It is ASCII only.\n",
+    "* Denotes how stressed every vowel is from 0-2."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "fragment"
+    }
+   },
+   "source": [
+    "This is perfect! Word's syllable count equals the number of digits in the Arpabet encoding."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "# CMU Pronouncing Dictionary (CMUdict)\n",
+    "\n",
+    "A large open source dictionary of English words to North American pronunciations in Arpanet encoding."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "fragment"
+    }
+   },
+   "source": [
+    "Conveniently, it is also in NLTK..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "# Counting Syllables"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {
+    "collapsed": true,
+    "slideshow": {
+     "slide_type": "fragment"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import string\n",
+    "from nltk.corpus import cmudict\n",
+    "cmu = cmudict.dict()\n",
+    "\n",
+    "def count_syllables(word):\n",
+    "    lower_word = word.lower()\n",
+    "    if lower_word in cmu:\n",
+    "        return max([len([y for y in x if y[-1] in string.digits])\n",
+    "                    for x in cmu[lower_word]])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {
+    "slideshow": {
+     "slide_type": "fragment"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "poet: 2\n",
+      "does: 1\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(\"poet: {}\\ndoes: {}\".format(count_syllables(\"poet\"),\n",
+    "                                  count_syllables(\"does\")))"
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {
@ -585,7 +781,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 33,
+   "execution_count": 13,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
@ -613,9 +809,20 @@
    "print parsed"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "slideshow": {
+     "slide_type": "slide"
+    }
+   },
+   "source": [
+    "## NLTK Syntax Trees! ##"
+   ]
+  },
  {
   "cell_type": "code",
-   "execution_count": 34,
+   "execution_count": 14,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
@ -660,7 +867,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 30,
+   "execution_count": 15,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"