Added rhyming and syllable counting slides

This commit is contained in:
Tyler Hallada 2017-05-03 01:16:30 -04:00
parent b3145c9754
commit 153291e59d

View File

@ -101,7 +101,7 @@
}
},
"source": [
"To create **bigrams**, iterate through the list of words with two indicies, one of which is offset by one:"
"To create **bigrams**, iterate through the list of words with two indices, one of which is offset by one:"
]
},
{
@ -454,11 +454,9 @@
"source": [
"## Whole sentences can be the conditions and values too ##\n",
"\n",
"Which is basically the way cleverbot works:\n",
"Which is basically the way cleverbot works ([http://www.cleverbot.com/](http://www.cleverbot.com/)):\n",
"\n",
"![Cleverbot](images/cleverbot.png)\n",
"\n",
"[http://www.cleverbot.com/](http://www.cleverbot.com/)"
"![Cleverbot](images/cleverbot.png)"
]
},
{
@ -474,7 +472,7 @@
},
{
"cell_type": "code",
"execution_count": 31,
"execution_count": 10,
"metadata": {
"slideshow": {
"slide_type": "fragment"
@ -520,7 +518,7 @@
"source": [
"## Random poems ##\n",
"\n",
"Generating random poems is simply limiting the choice of the next word by some constraint:\n",
"Generating random poems is accomplished by limiting the choice of the next word by some constraint:\n",
"\n",
"* words that rhyme with the previous line\n",
"* words that match a certain syllable count\n",
@ -528,6 +526,204 @@
"* etc."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Rhyming\n",
"\n",
"**Written English != Spoken English**\n",
"\n",
"English is highly **nonphonemic**, meaning that the letters often have no correspondence to the pronunciation. E.g.:\n",
"\n",
"\n",
"\"meet\" vs. \"meat\"\n",
"\n",
"The vowels are spelled differently, yet they rhyme."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Fun fact: They used to be pronounced differently in Middle English during the invention of the printing press and standardized spelling."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# International Phonetic Alphabet (IPA)\n",
"\n",
"An alphabet that can represent all varieties of human pronunciation.\n",
"\n",
"* meet: /mit/\n",
"* meat: /mit/"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Note: this is only the IPA transcription for only one **accent** of English."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Syllables\n",
"\n",
"* \"poet\" = 2 syllables\n",
"* \"does\" = 1 syllable\n",
"\n",
"Can the IPA tell us the number of syllables in a word too?"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"* poet: /ˈpoʊət/\n",
"* does: /ˈdʌz/\n",
"\n",
"Not really... We cannot easily identify three syllables from that transcription.\n",
"\n",
"Sometimes the transcriber denotes syllable breaks (with a `.` or a `'`), but sometimes they don't."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Arpabet\n",
"\n",
"A phonetic alphabet developed by ARPA in the 70s that:\n",
"\n",
"* Encodes phonemes specific to American English.\n",
"* Meant to be a machine readable code. It is ASCII only.\n",
"* Denotes how stressed every vowel is from 0-2."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"This is perfect! Word's syllable count equals the number of digits in the Arpabet encoding."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# CMU Pronouncing Dictionary (CMUdict)\n",
"\n",
"A large open source dictionary of English words to North American pronunciations in Arpanet encoding."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Conveniently, it is also in NLTK..."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Counting Syllables"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"import string\n",
"from nltk.corpus import cmudict\n",
"cmu = cmudict.dict()\n",
"\n",
"def count_syllables(word):\n",
" lower_word = word.lower()\n",
" if lower_word in cmu:\n",
" return max([len([y for y in x if y[-1] in string.digits])\n",
" for x in cmu[lower_word]])"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"poet: 2\n",
"does: 1\n"
]
}
],
"source": [
"print(\"poet: {}\\ndoes: {}\".format(count_syllables(\"poet\"),\n",
" count_syllables(\"does\")))"
]
},
{
"cell_type": "markdown",
"metadata": {
@ -585,7 +781,7 @@
},
{
"cell_type": "code",
"execution_count": 33,
"execution_count": 13,
"metadata": {
"slideshow": {
"slide_type": "fragment"
@ -613,9 +809,20 @@
"print parsed"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## NLTK Syntax Trees! ##"
]
},
{
"cell_type": "code",
"execution_count": 34,
"execution_count": 14,
"metadata": {
"slideshow": {
"slide_type": "fragment"
@ -660,7 +867,7 @@
},
{
"cell_type": "code",
"execution_count": 30,
"execution_count": 15,
"metadata": {
"slideshow": {
"slide_type": "fragment"