Added rhyming and syllable counting slides
This commit is contained in:
parent
b3145c9754
commit
153291e59d
@ -101,7 +101,7 @@
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"To create **bigrams**, iterate through the list of words with two indicies, one of which is offset by one:"
|
||||
"To create **bigrams**, iterate through the list of words with two indices, one of which is offset by one:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -454,11 +454,9 @@
|
||||
"source": [
|
||||
"## Whole sentences can be the conditions and values too ##\n",
|
||||
"\n",
|
||||
"Which is basically the way cleverbot works:\n",
|
||||
"Which is basically the way cleverbot works ([http://www.cleverbot.com/](http://www.cleverbot.com/)):\n",
|
||||
"\n",
|
||||
"![Cleverbot](images/cleverbot.png)\n",
|
||||
"\n",
|
||||
"[http://www.cleverbot.com/](http://www.cleverbot.com/)"
|
||||
"![Cleverbot](images/cleverbot.png)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -474,7 +472,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 31,
|
||||
"execution_count": 10,
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "fragment"
|
||||
@ -520,7 +518,7 @@
|
||||
"source": [
|
||||
"## Random poems ##\n",
|
||||
"\n",
|
||||
"Generating random poems is simply limiting the choice of the next word by some constraint:\n",
|
||||
"Generating random poems is accomplished by limiting the choice of the next word by some constraint:\n",
|
||||
"\n",
|
||||
"* words that rhyme with the previous line\n",
|
||||
"* words that match a certain syllable count\n",
|
||||
@ -528,6 +526,204 @@
|
||||
"* etc."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "slide"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"# Rhyming\n",
|
||||
"\n",
|
||||
"**Written English != Spoken English**\n",
|
||||
"\n",
|
||||
"English is highly **nonphonemic**, meaning that the letters often have no correspondence to the pronunciation. E.g.:\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\"meet\" vs. \"meat\"\n",
|
||||
"\n",
|
||||
"The vowels are spelled differently, yet they rhyme."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "fragment"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"Fun fact: They used to be pronounced differently in Middle English during the invention of the printing press and standardized spelling."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "slide"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"# International Phonetic Alphabet (IPA)\n",
|
||||
"\n",
|
||||
"An alphabet that can represent all varieties of human pronunciation.\n",
|
||||
"\n",
|
||||
"* meet: /mit/\n",
|
||||
"* meat: /mit/"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "fragment"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"Note: this is only the IPA transcription for only one **accent** of English."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "slide"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"# Syllables\n",
|
||||
"\n",
|
||||
"* \"poet\" = 2 syllables\n",
|
||||
"* \"does\" = 1 syllable\n",
|
||||
"\n",
|
||||
"Can the IPA tell us the number of syllables in a word too?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "fragment"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"* poet: /ˈpoʊət/\n",
|
||||
"* does: /ˈdʌz/\n",
|
||||
"\n",
|
||||
"Not really... We cannot easily identify three syllables from that transcription.\n",
|
||||
"\n",
|
||||
"Sometimes the transcriber denotes syllable breaks (with a `.` or a `'`), but sometimes they don't."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "slide"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"# Arpabet\n",
|
||||
"\n",
|
||||
"A phonetic alphabet developed by ARPA in the 70s that:\n",
|
||||
"\n",
|
||||
"* Encodes phonemes specific to American English.\n",
|
||||
"* Meant to be a machine readable code. It is ASCII only.\n",
|
||||
"* Denotes how stressed every vowel is from 0-2."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "fragment"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"This is perfect! Word's syllable count equals the number of digits in the Arpabet encoding."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "slide"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"# CMU Pronouncing Dictionary (CMUdict)\n",
|
||||
"\n",
|
||||
"A large open source dictionary of English words to North American pronunciations in Arpanet encoding."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "fragment"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"Conveniently, it is also in NLTK..."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "slide"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"# Counting Syllables"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"slideshow": {
|
||||
"slide_type": "fragment"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import string\n",
|
||||
"from nltk.corpus import cmudict\n",
|
||||
"cmu = cmudict.dict()\n",
|
||||
"\n",
|
||||
"def count_syllables(word):\n",
|
||||
" lower_word = word.lower()\n",
|
||||
" if lower_word in cmu:\n",
|
||||
" return max([len([y for y in x if y[-1] in string.digits])\n",
|
||||
" for x in cmu[lower_word]])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "fragment"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"poet: 2\n",
|
||||
"does: 1\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(\"poet: {}\\ndoes: {}\".format(count_syllables(\"poet\"),\n",
|
||||
" count_syllables(\"does\")))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
@ -585,7 +781,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 33,
|
||||
"execution_count": 13,
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "fragment"
|
||||
@ -613,9 +809,20 @@
|
||||
"print parsed"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "slide"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## NLTK Syntax Trees! ##"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 34,
|
||||
"execution_count": 14,
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "fragment"
|
||||
@ -660,7 +867,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 30,
|
||||
"execution_count": 15,
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "fragment"
|
||||
|
Loading…
Reference in New Issue
Block a user