|
@@ -0,0 +1,13 @@
|
|
1
|
+What needs to be improved about this repo:
|
|
2
|
+
|
|
3
|
+Generalize and standardize the steps in an NLP pipeline into python classes and
|
|
4
|
+functions. I can think of these off the top of my head:
|
|
5
|
+
|
|
6
|
+* Scraper - get text from the internet to local file
|
|
7
|
+* Cleaner - clean raw text of non-corpus text
|
|
8
|
+* Ngramer - assemble text in python list of lists
|
|
9
|
+* Cfdister - restructure data into a conditional frequency distribution
|
|
10
|
+* Other? - restructure data by other metric (rhyming, similarity, etc.)
|
|
11
|
+* Assembler loop - takes structure above and outputs one word
|
|
12
|
+ - Maybe should wrap in a sentence loop, line-by-line loop, paragraph loop,
|
|
13
|
+ etc.
|