Hi There!

I'm Dan Schlegel, an Associate Professor in the Computer Science Department at SUNY Oswego

CSC350 – Spring 2024

Computational Linguistics

Quick Links: Brightspace | Useful Resources | Brush-Up Resources | Day-By-Day Schedule

Lecturer:

Prof. Daniel R. Schlegel, 464 Shineman Center, daniel.schlegel@oswego.edu
Office/Lab hours: Monday 2-3, Tuesday 10-12, and by appointment. Email any time with questions!
Section 800: MWF 11:30am-12:25pm, Shineman 170

Course Description:

This course provides an introduction to natural language processing techniques. Specification, implementation, and evaluation of machine learning techniques as applied to natural language will be discussed. We will examine relevant linguistic constructs as we build from the bag of words model of language to richer structural models representing the relationships between words and phrases to encode meaning.

Course Objectives:

Students who complete this course will be able to: 

  • Grasp fundamental concepts in linguistics relevant to natural language processing.
  • Understand and discuss the mathematical and theoretical underpinnings of algorithmic and statistical techniques applied to language processing.
  • Select approaches for solving language processing tasks and defend their decisions. 
  • Implement algorithms and techniques relevant to language processing applied to problems addressed by the field.
  • Evaluate performance of algorithms on linguistic tasks and improve upon results using knowledge of the algorithms, linguistics, and the problem being solved.

Prerequisites:

The course catalog prerequisite for this course is CSC241. I would add that a course in calculus and some programming experience beyond CSC241 is very highly recommended. Ideally, you will have taken CSC365 (or at least one of the 300-level core CS courses). This course will make use of calculus, linear algebra, probability and statistics. Most everyone will have to pick up some additional mathematical skills along the way, but having to pick up all of them is probably not possible.

Textbooks:

Required: Eisenstein, J. Introduction to Natural Language Processing (Draft). MIT Press, 2019. [See Brightspace for pdf!]
Required: Jurafsky, D. and Martin, J. Speech and Language Processing, 3rd Edition (Draft). [See Brightspace for pdf!]

Useful Resources:

Jurafsky and Martin – Speech and Language Processing 3rd Edition Draft
The Python Tutorial
PyTorch Tutorials
NumPy Documentation

Attendance and Participation:

As per college policy, attendance in all sessions is obligatory. If you cannot attend a class meeting due to religious, athletic, health related circumstance, or circumstance of particular hardship, please notify me in advance via email. Please be ready to present proof, if necessary. It is expected that each person actively engage in each class session. 

This course includes a participation component. It is expected that concepts will need clarification – ask questions! Reading guides will be provided for some assigned readings and may cause other questions to arise which we must discuss. Participation in class and discussions is mandatory and will be factored into the final grade. 

Classroom Etiquette:

A positive learning environment relies upon creating an atmosphere where all students feel welcome. Classroom discussion is meant to allow us to hear a variety of viewpoints. This can only happen if we respect each other and our differences. Hostility and disrespectful behavior is not acceptable.

Cell phones and headphones should not be out or used during lecture, and laptops should only be used for taking notes. If use of any electronics becomes distracting to other students I reserve the right to discontinue the allowance of their use.

The current university policy will decide the class policy on masks and social distancing. Regardless of class policy, you can do more! If you wish to wear a mask, please do so! If you are sick with a potentially communicable disease, you should wear a mask and seriously consider staying home to prevent the spread of disease. 

Modality:

We meet in person three times a week. There will be no recording, and we won’t work through the content with you in a one-on-one fashion during office hours or an appointment – basically, if you miss class then you missed out on what you signed up for when registering for the class and you will have to work through the content independently. Of course, I will answer questions about the content, if you have any, and there are exceptions for excused absences due to illness etc.

“Go Remote” due to COVID-19: If we are forced to “go remote” for a prolonged period during the semester then we’ll hold class over Zoom and make a Zoom link available on course webpage, as well as email it with the weekly content. This is definitely a sub-par, miserable excuse for a class meeting. We’ll do this only if circumstances force us.

“Go Remote” Days Due to Weather: In short, we value our snow days and will do what is reasonable to make sure that no commuter is in danger, while also maintaining academic integrity. In long, the concept of a “Go Remote” day due to weather is crap. We deserve our snow days. Yes, snow days are a serious inconvenience and create a content crunch, especially if classes are cancelled indiscriminately. But, snow days are also a weather-gifted day of respite and nearly everyone appreciates an occasional day of rest. I may choose to skip certain topics during the semester so that we can afford to cancel class on a ”Go Remote” day. If it turns out that many classes are cancelled due to weather, then I will have to assign independent reading / activities to cover some of the content.

Grading Summary:

Grades will be comprised of participation, programming projects, written homework assignments, biweekly quizzes, and a final exam. A point-based system will be used, where each graded artifact will be assigned a point value and you can simply sum the points to determine your grade.

AssessmentPoints
Projects (3-4)400
Homeworks (7)70
Participation130
Quizzes (5)200
Final Exam200
Total1000

The default grading for the course will be along the university’s standard grading curve:

Letter: PointsLetter: Points
A: 930-1000C+: 770-790
A-: 900-920C: 730-760
B+: 870-890C-: 700-720
B: 830-860D+: 670-690
B-: 800-820D: 600-660
 E: 0-590

Projects:

All projects are to be completed alone and submitted on Brightspace once complete. Be sure not to post solutions on the internet during or after the course as we wish to use these problems in the future. 

Projects will be graded based on completion and quality of submission (including quality of code). All projects have a competitive component in which points will be assigned for particularly good solutions scored objectively on hidden data sets. Results will be presented in class and particularly interesting solutions may be examined in detail. 

Projects are considered on-time if they are submitted on or before the due date, with an 11:59pm cutoff time for submission. Projects may still be submitted after the deadline with a 5% per day penalty.

Note that no credit will be given for projects which fail to run, and partial credit will be given if only parts of the project work as described. 

Homework Assignments:

Homework assignments will give you additional practice with some of the more theoretical concepts discussed in class. Solutions are to be written on the provided homework sheets and submitted on the due date at the start of class. No late homework assignments will be accepted.

Homework due dates correspond to the quiz which will test (among other things) the understanding of concepts from the homework assignments. 

Exams and Quizzes:

Take-home quizzes will be given roughly every two weeks, and there will be a take-home final exam during finals week. The lowest quiz grade will be dropped. 

Each exam and quiz question will be assigned a point value, questionPoints, where the following general scheme will be used in grading it:

0 – Did not attempt / No serious attempt / Completely incorrect
1/3 * questionPoints – Mostly incorrect solution
2/3 * questionPoints – Somewhat incorrect solution
3/3 * questionPoints – Perfect solution

Intermediate scores will be given as appropriate. The total points received on all questions will then be summed. 

Schedule/Outline:

The course will be generally divided into four segments, during which we will build up our understanding of algorithms applied to the structure of language from simple word frequency based models to semantics based on structure. This outline is detailed in the graphic syllabus. This is highly optimistic, and we may not get through everything.

This syllabus and the course schedule are subject to change by the instructor. All changes and related justifications will be announced in class, and updates will be reflected in this web version.

WeekDayDateTopicAssignment/Assessment
1Monday1/22First Day of Class
Syllabus & Overview
Pre-definitely-not-a-test
Reading: Eisenstein Chapter 1 + Appendix A
Wednesday1/24Overview
Discuss Pre-definitely-not-a-test
Brush-Up Resources
Bag of Words Model
Reading: Eisenstein 2-2.1, 4-4.2
HW1 Due 1/29 (submit on paper at start of class)
Friday1/26Bag of Words ModelReading: Eisenstein 2.2; SLP3 4-4.7
2Monday1/29Modeling by Hand
Naive Bayes
Reading: Eisenstein 2.3-2.3.1, 4.3; SLP3 7-7.1
Wednesday1/31Project 1 DiscussionProject 1 Assigned
HW2 Due 2/7 (submit on paper at start of class)
Friday2/2Project Q&A
Perceptron Introduced
Quiz 1 Due 2/5 (submit on paper at start of class)
3Monday2/5Perceptron ConcludedReading: Eisenstein 2.5-2.6, SLP3 5-5.6
Wednesday2/7Logistic RegressionReading: Eisenstein 3-3.3, SLP3 7.2-7.6
Friday2/9Finish LR exercise
Work Day / Project Q&A
Project 1 Parts 1-3 Due
4Monday2/12Dan Sick – Work Day
Wednesday2/14NN IntroQuiz 2 and Homework 3 due 2/19
Friday2/16Backprop
Computation Graphs
Project 1 Parts 4-7 Due
Reading: Eisenstein 7-7.1, 8
SLP3 8-8.3
5Monday2/19Work Day!
Wednesday2/21Project 2 Discussion
Friday2/23POS TaggingReading: Eisenstein 7.2-7.4, SLP3 8.4
6Monday2/26Markov Chains
HMM Introduction
Quiz 3 due 3/1
Wednesday2/28Viterbi Algorithm
Friday3/1HMM/Viterbi Example
7Monday3/4Language ModelsReading Eisenstein 6-6.5, 7.5-7.6, 14-14.4
SLP3 Chapter 6, 7.7
Wednesday3/6
Friday3/8
8Monday3/11No Class – Spring Break
Wednesday3/13No Class – Spring Break
Friday3/15No Class – Spring Break
9Monday3/18Let’s talk about Project 2…
Wednesday3/20LSTMs and GRUsHW4 due 3/25
Project 2 Parts 1-4 Due
Friday3/22Formal LanguagesReading Eisenstein 10-11
SLP3 17-18
10Monday3/25Constituency Parsing
Wednesday3/27Work Day
Friday3/29Work DayProject 2 Parts 5-7 Due
11Monday4/1CKY Parser Practice
Wednesday4/3Head Words + Dependency Parsing
Friday4/5
12Monday4/8Solar Eclipse – No Class
Wednesday4/10Motivational Example: Clinical TractorReading: Eisenstein 12-12.4 , SLP3 Chapter 20, Appendix F
Friday4/12Logical Semantics
13Monday4/15Guest Q&AHW5, Quiz 4 Due
Wednesday4/17Quest Day, No Class
Friday3/19Semantic Parsing
14Monday4/22Semantic ParsingProject 3 Due
Wednesday4/24Predicate-Argument SemanticsReading: Attention is All You Need
Friday4/26Work Day
15Monday4/29Transformers
Illustrated Transformer
Wednesday5/1Work Day
Friday5/3HW6, Quiz 5 Due
Finals WeekFriday5/10Take-Home Final Exam DueTake-Home Final Exam Due

Academic Integrity:

SUNY Oswego is committed to Intellectual Integrity. Any form of intellectual dishonesty is a serious concern and therefore prohibited. You can find the full policy online. While it is acceptable to discuss general approaches with your fellow students, the work you turn in must be your own. You may not turn in code found on the internet. If you have any problems doing the assignments, consult the instructor. See my page on plagiarism for an explanation of what I consider cheating.

Accessibility:

If you have a disabling condition which may interfere with your ability to successfully complete this course, please contact Accessibility Resources located at 155 Marano Campus Center, phone 315.312.3358, access@oswego.edu

Clery Act/Title IX Reporting:

SUNY Oswego is committed to enhancing the safety and security of the campus for all its members. In support of this, faculty may be required to report their knowledge of certain crimes or harassment. Reportable incidents include harassment on the basis of sex or gender prohibited by Title IX and crimes covered by the Clery Act. For more information about Title IX protections, go to https://www.oswego.edu/title-ix/ or contact the Title IX Coordinator, 405 Culkin Hall, 315-312-5604, titleix@oswego.edu. For more information about the Clery Act and campus reporting, go to the University Police annual report: https://www.oswego.edu/police/annual-report.