Added rhyming and syllable counting slides
This commit is contained in:
parent
b3145c9754
commit
153291e59d
@ -101,7 +101,7 @@
|
|||||||
}
|
}
|
||||||
},
|
},
|
||||||
"source": [
|
"source": [
|
||||||
"To create **bigrams**, iterate through the list of words with two indicies, one of which is offset by one:"
|
"To create **bigrams**, iterate through the list of words with two indices, one of which is offset by one:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -454,11 +454,9 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"## Whole sentences can be the conditions and values too ##\n",
|
"## Whole sentences can be the conditions and values too ##\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Which is basically the way cleverbot works:\n",
|
"Which is basically the way cleverbot works ([http://www.cleverbot.com/](http://www.cleverbot.com/)):\n",
|
||||||
"\n",
|
"\n",
|
||||||
"![Cleverbot](images/cleverbot.png)\n",
|
"![Cleverbot](images/cleverbot.png)"
|
||||||
"\n",
|
|
||||||
"[http://www.cleverbot.com/](http://www.cleverbot.com/)"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -474,7 +472,7 @@
|
|||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": 31,
|
"execution_count": 10,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"slideshow": {
|
"slideshow": {
|
||||||
"slide_type": "fragment"
|
"slide_type": "fragment"
|
||||||
@ -520,7 +518,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"## Random poems ##\n",
|
"## Random poems ##\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Generating random poems is simply limiting the choice of the next word by some constraint:\n",
|
"Generating random poems is accomplished by limiting the choice of the next word by some constraint:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"* words that rhyme with the previous line\n",
|
"* words that rhyme with the previous line\n",
|
||||||
"* words that match a certain syllable count\n",
|
"* words that match a certain syllable count\n",
|
||||||
@ -528,6 +526,204 @@
|
|||||||
"* etc."
|
"* etc."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"slideshow": {
|
||||||
|
"slide_type": "slide"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"# Rhyming\n",
|
||||||
|
"\n",
|
||||||
|
"**Written English != Spoken English**\n",
|
||||||
|
"\n",
|
||||||
|
"English is highly **nonphonemic**, meaning that the letters often have no correspondence to the pronunciation. E.g.:\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"\"meet\" vs. \"meat\"\n",
|
||||||
|
"\n",
|
||||||
|
"The vowels are spelled differently, yet they rhyme."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"slideshow": {
|
||||||
|
"slide_type": "fragment"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"Fun fact: They used to be pronounced differently in Middle English during the invention of the printing press and standardized spelling."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"slideshow": {
|
||||||
|
"slide_type": "slide"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"# International Phonetic Alphabet (IPA)\n",
|
||||||
|
"\n",
|
||||||
|
"An alphabet that can represent all varieties of human pronunciation.\n",
|
||||||
|
"\n",
|
||||||
|
"* meet: /mit/\n",
|
||||||
|
"* meat: /mit/"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"slideshow": {
|
||||||
|
"slide_type": "fragment"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"Note: this is only the IPA transcription for only one **accent** of English."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"slideshow": {
|
||||||
|
"slide_type": "slide"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"# Syllables\n",
|
||||||
|
"\n",
|
||||||
|
"* \"poet\" = 2 syllables\n",
|
||||||
|
"* \"does\" = 1 syllable\n",
|
||||||
|
"\n",
|
||||||
|
"Can the IPA tell us the number of syllables in a word too?"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"slideshow": {
|
||||||
|
"slide_type": "fragment"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"* poet: /ˈpoʊət/\n",
|
||||||
|
"* does: /ˈdʌz/\n",
|
||||||
|
"\n",
|
||||||
|
"Not really... We cannot easily identify three syllables from that transcription.\n",
|
||||||
|
"\n",
|
||||||
|
"Sometimes the transcriber denotes syllable breaks (with a `.` or a `'`), but sometimes they don't."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"slideshow": {
|
||||||
|
"slide_type": "slide"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"# Arpabet\n",
|
||||||
|
"\n",
|
||||||
|
"A phonetic alphabet developed by ARPA in the 70s that:\n",
|
||||||
|
"\n",
|
||||||
|
"* Encodes phonemes specific to American English.\n",
|
||||||
|
"* Meant to be a machine readable code. It is ASCII only.\n",
|
||||||
|
"* Denotes how stressed every vowel is from 0-2."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"slideshow": {
|
||||||
|
"slide_type": "fragment"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"This is perfect! Word's syllable count equals the number of digits in the Arpabet encoding."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"slideshow": {
|
||||||
|
"slide_type": "slide"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"# CMU Pronouncing Dictionary (CMUdict)\n",
|
||||||
|
"\n",
|
||||||
|
"A large open source dictionary of English words to North American pronunciations in Arpanet encoding."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"slideshow": {
|
||||||
|
"slide_type": "fragment"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"Conveniently, it is also in NLTK..."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"slideshow": {
|
||||||
|
"slide_type": "slide"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"# Counting Syllables"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 11,
|
||||||
|
"metadata": {
|
||||||
|
"collapsed": true,
|
||||||
|
"slideshow": {
|
||||||
|
"slide_type": "fragment"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import string\n",
|
||||||
|
"from nltk.corpus import cmudict\n",
|
||||||
|
"cmu = cmudict.dict()\n",
|
||||||
|
"\n",
|
||||||
|
"def count_syllables(word):\n",
|
||||||
|
" lower_word = word.lower()\n",
|
||||||
|
" if lower_word in cmu:\n",
|
||||||
|
" return max([len([y for y in x if y[-1] in string.digits])\n",
|
||||||
|
" for x in cmu[lower_word]])"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 12,
|
||||||
|
"metadata": {
|
||||||
|
"slideshow": {
|
||||||
|
"slide_type": "fragment"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"poet: 2\n",
|
||||||
|
"does: 1\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"print(\"poet: {}\\ndoes: {}\".format(count_syllables(\"poet\"),\n",
|
||||||
|
" count_syllables(\"does\")))"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
@ -585,7 +781,7 @@
|
|||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": 33,
|
"execution_count": 13,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"slideshow": {
|
"slideshow": {
|
||||||
"slide_type": "fragment"
|
"slide_type": "fragment"
|
||||||
@ -613,9 +809,20 @@
|
|||||||
"print parsed"
|
"print parsed"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {
|
||||||
|
"slideshow": {
|
||||||
|
"slide_type": "slide"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"## NLTK Syntax Trees! ##"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": 34,
|
"execution_count": 14,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"slideshow": {
|
"slideshow": {
|
||||||
"slide_type": "fragment"
|
"slide_type": "fragment"
|
||||||
@ -660,7 +867,7 @@
|
|||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": 30,
|
"execution_count": 15,
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"slideshow": {
|
"slideshow": {
|
||||||
"slide_type": "fragment"
|
"slide_type": "fragment"
|
||||||
|
Loading…
Reference in New Issue
Block a user