Hi There!

I'm Dan Schlegel, an Associate Professor in the Computer Science Department at SUNY Oswego

CSC350 – Spring 2020

Computational Linguistics

Updates as of 3/14/20 HERE and also (in less deail) in RED below.

Lecturer:

Prof. Daniel R. Schlegel, 464 Shineman Center, daniel.schlegel@oswego.edu
Office/Lab hours: M 3:00-4:00pm; T: 3:00-4:00pm; Th: 2:30-3:30pm; and by appointment on Blackboard Collaborate. Feel free to email any time to schedule a meeting!
Section 800: TTh 11:10-12:30am, Shineman 444 on Blackboard Collaborate

Course Description:

This course provides an introduction to natural language processing techniques. Specification, implementation, and evaluation of machine learning techniques as applied to natural language will be discussed. We will examine relevant linguistic constructs as we build from the bag of words model of language to richer structural models representing the relationships between words and phrases to encode meaning.

Course Objectives:

Students who complete this course will be able to: 

  • Grasp fundamental concepts in linguistics relevant to natural language processing.
  • Understand and discuss the mathematical and theoretical underpinnings of algorithmic and statistical techniques applied to language processing.
  • Select approaches for solving language processing tasks and defend their decisions. 
  • Implement algorithms and techniques relevant to language processing applied to problems addressed by the field.
  • Evaluate performance of algorithms on linguistic tasks and improve upon results using knowledge of the algorithms, linguistics, and the problem being solved.

Prerequisites:

The course catalog prerequisite for this course is CSC241. I would add that a course in calculus and some programming experience beyond CSC241 is very highly recommended. Ideally, you will have taken CSC365. This course will make use of calculus, linear algebra, probability and statistics. Most everyone will have to pick up some additional mathematical skills along the way, but having to pick up all of them is probably not possible.

Textbooks:

Required: Eisenstein, J. Introduction to Natural Language Processing. MIT Press, 2019.

Useful Resources:

Jurafsky and Martin – Speech and Language Processing 3rd Edition Draft
The Python Tutorial
PyTorch Tutorials
NumPy Documentation

Attendance and Participation:

As per college policy, attendance in all sessions is obligatory. If you cannot attend a class meeting due to religious, athletic, health related circumstance, or circumstance of particular hardship, please notify me in advance via email. Please be ready to present proof, if necessary. It is expected that each person actively engage in each class session. Should you miss a session, some notes and sample programs will be uploaded to Blackboard. Please try to attend office hours or use the Course Room to discuss with other students to obtain more complete notes.

This course includes a significant discussion component. Participation in discussions is mandatory and will be factored into the final grade. 

Classroom Etiquette:

A positive learning environment relies upon creating an atmosphere where all students feel welcome. Classroom discussion is meant to allow us to hear a variety of viewpoints. This can only happen if we respect each other and our differences. Hostility and disrespectful behavior is not acceptable.

Grading Summary:

Grades will be comprised of participation, programming projects, written homework assignments, biweekly quizzes, and a final exam. A point-based system will be used, where each graded artifact will be assigned a point value and you can simply sum the points to determine your grade.

AssessmentPoints
Projects (3-4)400
Homeworks (7)70
Participation130
Quizzes (5)200
Final Exam200
Total1000

The default grading for the course will be along the university’s standard grading curve:

Letter: PointsLetter: Points
A: 930-1000C+: 770-790
A-: 900-920C: 730-760
B+: 870-890C-: 700-720
B: 830-860D+: 670-690
B-: 800-820D: 600-660
 E: 0-590

Projects:

All projects are to be completed alone and submitted on Blackboard once complete. Be sure not to post solutions on the internet during or after the course as we wish to use these problems in the future. 

Projects will be graded based on completion and quality of submission (including quality of code). All projects have a competitive component in which points will be assigned for particularly good solutions scored objectively on hidden data sets. Results will be presented in class and particularly interesting solutions may be examined in detail. 

Projects are considered on-time if they are submitted on or before the due date, with an 11:59pm cutoff time for submission. Projects may still be submitted up to 72 hours after the deadline with a 10% 1% per day penalty. Assignments submitted after 72 hours will not be accepted.

Note that no credit will be given for projects which fail to run, and partial credit will be given if only parts of the project work as described. 

Homework Assignments:

Homework assignments will give you additional practice with some of the more theoretical concepts discussed in class. Solutions are to be written on the provided homework sheets and submitted on the due date at the start of class. No late homework assignments will be accepted.

Homework due dates correspond to the quiz which will test (among other things) the understanding of concepts from the homework assignments. 

Exams and Quizzes:

You may bring your book to quizzes and the final exam, but may not use any notes or electronic aides. 

Quizzes will be given roughly every two weeks, and there will be a final exam during finals week. The lowest quiz grade will be dropped. Should the final exam need to be given in a socially distanced way, the rules and format will remain but mechanics may differ.

Each exam and quiz question will be assigned a point value, questionPoints, where the following general scheme will be used in grading it:

0 – Did not attempt / No serious attempt / Completely incorrect
1/3 * questionPoints – Mostly incorrect solution
2/3 * questionPoints – Somewhat incorrect solution
3/3 * questionPoints – Perfect solution

Intermediate scores will be given as appropriate. The total points received on all questions will then be summed. 

Schedule/Outline:

The course will be generally divided into four segments, during which we will build up our understanding of algorithms applied to the structure of language from simple word frequency based models to semantics based on structure. This outline is detailed in the graphic syllabus

This syllabus and the course schedule are subject to change by the instructor. All changes and related justifications will be announced in class, and updates will be reflected in this web version.

WeekDayDateTopicReading DueAssignments/Assessments
1Tuesday1/28First day of class
Course Overview
Remember to answer the office hours survey!
HW1 Assigned
Pretest
Thursday1/30Math "Review"Chapter 1, Appendix A
See (Goldwater, 2018) for more probability help.
2Tuesday2/4Bag of Words
Linear Classification
Chapter 2-2.1, Chapter 4-4.1
(Discussion Q's on Blackboard)
Project 1 Assigned
Thursday2/6Naive Bayes2.2 (Naive Bayes), 4.2 (Word Sense Disambiguation)
(Discussion Q's on Blackboard)
HW1 Due
Quiz 1
3Tuesday2/11Naive Bayes example
Perceptron / Support Vector Machines
2.3-2.4 (Discriminative Learning), 4.3 (Design Decisions for Classification)
Thursday2/13Logistic RegressionAppendix B (Optimization)
2.5-2.6 (Logistic Regression / Optimization)
Friday2/14Drop deadline
4Tuesday2/18Dan Sick
Thursday2/20Gradient Descent3-3.3 - Neural NetworksHW2 Due
5Tuesday2/25Neural NetworksQuiz 2
Thursday2/27Neural Networks, contd.
6Tuesday3/3n-gram Language Models6-6.2 - n-gram language models + smoothing
Thursday3/5Neural Language Models
RNNs
6.3-6.5 - neural language models + evaluation of language models
8.1 - Part of Speech Tagging
(For more on POS tagging, check out SLP3, 8-8.3)
HW3 Due
Quiz 3
Project 1 Due
Project 2 Assigned
7Tuesday3/10Evaluating Language Models
Part of Speech + Sequence Labeling
Universal Dependencies POS Tags
7-7.1, 8 - Sequence Labeling + Applications
Thursday3/12Hidden Markov Models7.2-7.4 - HMMs
8Tuesday3/17Spring Break - No Class
Thursday3/19Spring Break - No Class
9Tuesday3/24Discriminative Models
RNNs, Revisited
Illustrated guide to LSTMs and GRUs
7.5-7.6 - Discriminative Sequence Labeling + Neural Models
Thursday3/26Work Day14-14.4 - Distributed Semantics HW4 Due
10Tuesday3/31Word Embeddings
Explore Embeddings!
14.5-14.8 - Neural Embeddings
Thursday4/2Formal Languages (Review)Chapter 9 - Formal Language Theory
11Tuesday4/7Progress Reports / Group Work
Thursday4/9CKY Parsing10-10.3 - Bottom-Up Parsing
12Tuesday4/14One more CKY parsing example
Progress Reports / Group Work
Project 2 Due
Quiz 4
Thursday4/16Shift-Reduce Parsing / Dependency Parsing11-11.3 - Dependency Parsing
13Tuesday4/21Transition Dependency ParsersProject 3 Assigned (Due last day of finals week)
Thursday4/23Dan Out
14Tuesday4/28First Order Logic Semantics12-12.2 - Logical Semantics
Thursday4/30Lambda Calculus Semantics12.3-12.4 - Lambda CalculusHW5 Due
Quiz 5
15Tuesday5/5Work Day!
Thursday5/7Last day of class
Withdraw deadline Friday
Some thoughts on predicate-argument semantics
13 - Predicate-Argument SemanticsQuiz 5 Due
Finals WeekTake-Home Exam Assigned Monday, due Friday 4:30pmFinal Exam

Academic Integrity:

SUNY Oswego is committed to Intellectual Integrity. Any form of intellectual dishonesty is a serious concern and therefore prohibited. You can find the full policy online. While it is acceptable to discuss general approaches with your fellow students, the work you turn in must be your own. You may not turn in code found on the internet. If you have any problems doing the assignments, consult the instructor. See my page on plagiarism for an explanation of what I consider cheating.

Accessibility:

If you have a disabling condition which may interfere with your ability to successfully complete this course, please contact Accessibility Resources located at 155 Marano Campus Center, phone 315.312.3358, access@oswego.edu

Clery Act/Title IX Reporting:

SUNY Oswego is committed to enhancing the safety and security of the campus for all its members. In support of this, faculty may be required to report their knowledge of certain crimes or harassment. Reportable incidents include harassment on the basis of sex or gender prohibited by Title IX and crimes covered by the Clery Act. For more information about Title IX protections, go to https://www.oswego.edu/title-ix/ or contact the Title IX Coordinator, 405 Culkin Hall, 315-312-5604, titleix@oswego.edu. For more information about the Clery Act and campus reporting, go to the University Police annual report: https://www.oswego.edu/police/annual-report.