Added rhyming and syllable counting slides

2017-05-03 01:16:30 -04:00
parent b3145c9754
commit 153291e59d
1 changed files with 217 additions and 10 deletions
@@ -101,7 +101,7 @@
    }
   },
   "source": [
-    "To create **bigrams**, iterate through the list of words with two indicies, one of which is offset by one:"
+    "To create **bigrams**, iterate through the list of words with two indices, one of which is offset by one:"
   ]
  },
  {
@@ -454,11 +454,9 @@
   "source": [
    "## Whole sentences can be the conditions and values too ##\n",
    "\n",
-    "Which is basically the way cleverbot works:\n",
+    "Which is basically the way cleverbot works ([http://www.cleverbot.com/](http://www.cleverbot.com/)):\n",
    "\n",
-    "![Cleverbot](images/cleverbot.png)\n",
+    "![Cleverbot](images/cleverbot.png)"
    "\n",
    "[http://www.cleverbot.com/](http://www.cleverbot.com/)"
   ]
  },
  {
@@ -474,7 +472,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 31,
+   "execution_count": 10,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
@@ -520,7 +518,7 @@
   "source": [
    "## Random poems ##\n",
    "\n",
-    "Generating random poems is simply limiting the choice of the next word by some constraint:\n",
+    "Generating random poems is accomplished by limiting the choice of the next word by some constraint:\n",
    "\n",
    "* words that rhyme with the previous line\n",
    "* words that match a certain syllable count\n",
@@ -528,6 +526,204 @@
    "* etc."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Rhyming\n",
    "\n",
    "**Written English != Spoken English**\n",
    "\n",
    "English is highly **nonphonemic**, meaning that the letters often have no correspondence to the pronunciation. E.g.:\n",
    "\n",
    "\n",
    "\"meet\" vs. \"meat\"\n",
    "\n",
    "The vowels are spelled differently, yet they rhyme."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Fun fact: They used to be pronounced differently in Middle English during the invention of the printing press and standardized spelling."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# International Phonetic Alphabet (IPA)\n",
    "\n",
    "An alphabet that can represent all varieties of human pronunciation.\n",
    "\n",
    "* meet: /mit/\n",
    "* meat: /mit/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Note: this is only the IPA transcription for only one **accent** of English."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Syllables\n",
    "\n",
    "* \"poet\" = 2 syllables\n",
    "* \"does\" = 1 syllable\n",
    "\n",
    "Can the IPA tell us the number of syllables in a word too?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* poet: /ˈpoʊət/\n",
    "* does: /ˈdʌz/\n",
    "\n",
    "Not really... We cannot easily identify three syllables from that transcription.\n",
    "\n",
    "Sometimes the transcriber denotes syllable breaks (with a `.` or a `'`), but sometimes they don't."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Arpabet\n",
    "\n",
    "A phonetic alphabet developed by ARPA in the 70s that:\n",
    "\n",
    "* Encodes phonemes specific to American English.\n",
    "* Meant to be a machine readable code. It is ASCII only.\n",
    "* Denotes how stressed every vowel is from 0-2."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "This is perfect! Word's syllable count equals the number of digits in the Arpabet encoding."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# CMU Pronouncing Dictionary (CMUdict)\n",
    "\n",
    "A large open source dictionary of English words to North American pronunciations in Arpanet encoding."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Conveniently, it is also in NLTK..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Counting Syllables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": true,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "import string\n",
    "from nltk.corpus import cmudict\n",
    "cmu = cmudict.dict()\n",
    "\n",
    "def count_syllables(word):\n",
    "    lower_word = word.lower()\n",
    "    if lower_word in cmu:\n",
    "        return max([len([y for y in x if y[-1] in string.digits])\n",
    "                    for x in cmu[lower_word]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "poet: 2\n",
      "does: 1\n"
     ]
    }
   ],
   "source": [
    "print(\"poet: {}\\ndoes: {}\".format(count_syllables(\"poet\"),\n",
    "                                  count_syllables(\"does\")))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
@@ -585,7 +781,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 33,
+   "execution_count": 13,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
@@ -613,9 +809,20 @@
    "print parsed"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## NLTK Syntax Trees! ##"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 34,
+   "execution_count": 14,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
@@ -660,7 +867,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 30,
+   "execution_count": 15,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"