Computer Models of Language Representation and Processing
Lecturer:
Prof. Daniel R. Schlegel, 395 Shineman Center, daniel.schlegel@oswego.edu
Office/Lab hours: Thursday 9:30-11:30am; Friday 12:30-1:30pm; by appointment
Section 800: MWF 1:50-2:45pm, Shineman 175
Course Description:
This course seeks to establish a foundational framework for discussion of computational natural language processing. The topics that will be treated here are grounded in theories of knowledge representation and reasoning with particular reference to computational semantics and pragmatics. Emphasis will be placed primarily on symbolic systems, with some brief attention to connectionist and statistical approaches. Finally, some attention will be paid to criticism of approaches to natural language processing.
Course Objectives:
Upon successful completion of this course, students will:
have an understanding of computational approaches to working with natural language text;
be able to make use of modern tools and techniques to perform basic natural language text processing tasks;
have the foundation upon which they can build to solve more sophisticated problems involving natural language text.
Textbooks:
Jurafsky, D. and Martin, J.H., Speech and Language Processing, 2e. Prentice Hall, 2008.
Useful Resources:
Speech and Language Processing, 3rd edition draft chapters
Online Regular Expression Debugger – Regex101
Stanford CoreNLP Demo
Festival Speech Synthesis System
General Architecture for Text Engineering (GATE)
Universal Dependencies
Unified Verb Index
Attendance Policy and Classroom Etiquette:
As per college policy, attendance in all sessions is obligatory. If you cannot attend a class meeting due to religious, athletic, health related circumstance, or circumstance of particular hardship, please notify me in advance via email. Please be ready to present proof, if necessary. Cell phones and headphones should not be out or used during lecture, and laptops should only be used for taking notes (I don’t recommend this). If use of any electronics becomes districting to other students I reserve the right to discontinue the allowance of their use.
Assignments:
There will be 3-4 assignments and a final project. The assignments may (but do not have to) be completed with a partner of your choosing. It is a good idea for a partnership to have two people of different specialties or interests, for example a linguist and a computer scientist. The final project will be completed alone and will explore a topic of your choosing within the scope of this course. Further details about all assignments will be made available as they are assigned.
Grading:
Assignments will be submitted on blackboard and graded according to the quality of solution, including completeness and correctness. Written assignments will additionally be graded according to their quality as communicative artifacts. Quality of presentation will be incorporated into assignment grades for those which are presented in class.
It is expected that each person participate during each class. As discussed above, attendance is required.
Each exam question will be assigned a point value (generally some multiple of 3 depending on difficulty), where the following scheme will be used in grading it:
0 – Did not attempt / No serious attempt
1 – Mostly incorrect solution
2 – Somewhat incorrect solution
3 – Perfect solution
If the problem is a multiple of 3, then intermediate scores will be given as appropriate. The total points received on all questions will then be summed and divided by the points possible and scaled as appropriate according to the percentages given below.
Assignments | 20% |
Final Project | 30% |
Exam 1 | 15% |
Exam 2 | 15% |
Final Exam | 20% |
The default grading for the course will be along the university’s standard grading curve:
A: 93-100 | C+: 77-79 |
A-: 90-92 | C: 73-76 |
B+: 87-89 | C-: 70-72 |
B: 83-86 | D+: 67-69 |
B-: 80-82 | D: 60-66 |
E: 0-59 |
A more generous curve may be used, but should not be expected.
Schedule/Outline:
During the semester we aim to cover the following topics:
This syllabus and the course schedule are subject to change by the instructor. All changes and related justifications will be announced in class, and updates will be reflected in this web version.
Lecture slides will be maintained on Blackboard, but many lectures will include use of the whiteboard which may not be reflected in notes elsewhere.
Week | Day | Date | |
---|---|---|---|
1 | Monday | 1/22 | First day of class Syllabus; Course Overview Working with Text Intro Readings: SLP Chapter 1 |
Wednesday | 1/24 | Finish Working with Text Intro Research Overview (for context) Readings: Begin looking at SLP Section 2.1 Optional Readings:This Is Watson (on Blackboard) |
|
Friday | 1/26 | Regular Expressions Readings: Finish SLP Section 2.1; play with sample regular expressions on regex101. Optional Readings: Weizenbaum's ELIZA Paper |
|
2 | Monday | 1/29 | Regular Expressions (concluded) Readings: SLP 3rd Edition Draft Sections 2.2-2.3 |
Wednesday | 1/31 | Add deadline Writing your own chatbot Sample Python Chatbot Tokenization and Sentence Splitting Assignment 1 due 2/11, 11:59pm on Blackboard, demoed in class 2/12 Readings: SLP Chapter 3 through the end of section 3.1; Section 3.8 Optional Readings: Christiansen & Amon, More Than Words: The Role of Multiword Sequences in Language Learning and Use |
|
Friday | 2/2 | Text Normalization Reading: SLP 3rd ed, Ch 4 through end of 4.1 (don't worry too much about the math!) Optional Reading: M.F. Porter, An Algorithm for Suffix Stripping |
|
3 | Monday | 2/5 | Language Models; N-grams Readings: SLP 3rd Ed, Chapter 8 (again, don't get bogged down by the math!) Optional Readings: Johns and Jamieson, A Large-scale Analysis of Variance in Written Language, 2018; Dye, M., et al. Alternative Solutions to a Language Design Problem: The Role of Adjectives and Gender Marking in Efficient Communication. Topics in Cognitive Science, 2017 (on Blackboard) |
Wednesday | 2/7 | Snow Day! | |
Friday | 2/9 | Drop deadline N-grams continued; Neural Networks |
|
4 | Monday | 2/12 | Assignment 1 in-class demos |
Wednesday | 2/14 | Neural Language Models Concluded Finite Automata Introduction Assignment 2 due Readings: Sections 2.2-2.4 |
|
Friday | 2/16 | Finite State Transducers Morphological Analysis Readings: Section 3.2 through the end of 3.6 |
|
5 | Monday | 2/19 | Class Cancelled Readings: Chapter 5 |
Wednesday | 2/21 | Part of Speech Tagging Hidden Markov Models |
|
Friday | 2/23 | Class Cancelled | |
6 | Monday | 2/26 | Guest lecture: Dr. Jonathan Bona |
Wednesday | 2/28 | Exam 1 Readings: SLP 3rd Edition, Chapter 17 through the end of 17.4 |
|
Friday | 3/2 | POS Tagging Concluded; Word Senses |
|
7 | Monday | 3/5 | Word Senses; WordNet Readings: SLP 3rd Edition, Chapter 12 through end of 12.1; Chapter 14 through end of 14.3 Optional Readings: Chomsky, N. "Three Models for the Description of Language" 1956 (on Blackboard) |
Wednesday | 3/7 | Phrase Structure and Parsing Text | |
Friday | 3/9 | Parsing and Semantic Role Labeling Assignment 3 due |
|
8 | Monday | 3/12 | No Class - Spring Break |
Wednesday | 3/14 | No Class - Spring Break | |
Friday | 3/16 | No Class - Spring Break | |
9 | Monday | 3/19 | Tools for Natural Language Processing - GATE Readings: Read and follow along with Chapter 1 of the NLTK Book using Thonny or repl.it Final Project Description |
Wednesday | 3/21 | Tools for Natural Language Processing - NLTK Trace of repl.it from class |
|
Friday | 3/23 | Training Machine Learning Models in NLTK Sentiment Analysis |
|
10 | Monday | 3/26 | Logic Introduction Readings: Peter Suber's Translation Tips (propositional logic section) |
Wednesday | 3/28 | Logic, continued Translation of English sentences to Logic Project Proposals Due on Blackboard, 11:59pm |
|
Friday | 3/30 | No Class - Easter Weekend | |
11 | Monday | 4/2 | Model Finding Readings: Peter Suber's Translation Tips (predicate logic sections) Withdraw Deadline |
Wednesday | 4/4 | No Class - Quest Day | |
Friday | 4/6 | Exam 2 | |
12 | Monday | 4/9 | Predicate Logic |
Wednesday | 4/11 | Predicate Logic, concluded Non-Classical Logics |
|
Friday | 4/13 | Non-Classical Logics, concluded Frames Readings: Frames and Case Grammar sections from Shapiro, S.C., ed. Encyclopedia of Artificial Intelligence, 2nd ed. (on Blackboard) |
|
13 | Monday | 4/16 | Frames & Case Grammar |
Wednesday | 4/18 | Project Progress Discussion | |
Friday | 4/20 | Graphs for Language Understanding Readings: SLP 3rd Edition, Section 29.2, Chapter 30 |
|
14 | Monday | 4/23 | A Return to Dialog and Discourse Extra Credit Assignment due 5/11, 11:59pm on Blackboard |
Wednesday | 4/25 | No Class | |
Friday | 4/27 | Speech Recognition / Synthesis | |
15 | Monday | 4/30 | Project Presentations |
Wednesday | 5/2 | Project Presentations | |
Friday | 5/4 | Last day of class Final Exam Study Guide Project Presentations Final Project Papers/Code Due |
|
Finals Week | Monday | 5/7 | Final Exam 2-4pm, 175 Shineman |
Academic Integrity:
While it is acceptable to discuss general approaches with your fellow students, the work you turn in must be your own. You may not turn in code found on the internet. If you have any problems doing the assignments, consult the instructor. Please be sure to read the webpage, “Academic Integrity“, which spells out all the details of this, and related policies. See my page on plagiarism for an explanation of what I consider cheating.
Disability Statement:
If you have a disabling condition, which may interfere with your ability to successfully complete this course, please contact the Office of Disability Services at dss@oswego.edu and x3358.