{"id":6006,"date":"2026-04-23T14:59:14","date_gmt":"2026-04-23T14:59:14","guid":{"rendered":"https:\/\/danielschlegel.org\/wp\/?page_id=6006"},"modified":"2026-04-23T15:04:47","modified_gmt":"2026-04-23T15:04:47","slug":"assignment-5-structured-summaries","status":"publish","type":"page","link":"https:\/\/danielschlegel.org\/wp\/teaching\/csc344-spring-2026\/assignment-5-structured-summaries\/","title":{"rendered":"Assignment 5 &#8211; Structured Summaries"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Microproject<\/h2>\n\n\n\n<p>Write a Python program which takes a single argument \u2014 a URL. Your program will <strong>use the Unix command curl<\/strong> to download the file at that URL, remove all of the embedded javascript and css from the file, and write the resulting file to the screen. Be sure to test it on a variety of URLs. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Main Project<\/h2>\n\n\n\n<p>Write a Python program that does the following:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Asks the user to enter a URL;<\/li>\n\n\n\n<li>Downloads the contents of that URL using curl (or some other unix application);<\/li>\n\n\n\n<li>Extracts features from the page, including at least:\n<ul class=\"wp-block-list\">\n<li>Headings<\/li>\n\n\n\n<li>Link text along with URLs<\/li>\n\n\n\n<li>Image URL and alt text<\/li>\n\n\n\n<li>Email addresses<\/li>\n\n\n\n<li>Phone numbers<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Uses unix tools to determine:\n<ul class=\"wp-block-list\">\n<li>word count<\/li>\n\n\n\n<li>top N words, sorted alphabetically<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Builds a structured summary including the above items and show it to the user;<\/li>\n\n\n\n<li>Sends the summary to an LLM using a prompt of your choice to analyze the structured summary. Some ideas: \n<ul class=\"wp-block-list\">\n<li>Ask the LLM to try to determine what the website is about.<\/li>\n\n\n\n<li>Modify the structured summary so that it is segmented by heading section and ask the LLM to verify that the contents of each section of the page matches the heading.<\/li>\n\n\n\n<li>Ask the LLM to try to predict any bias on the page.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p>For the LLM component, I recommend using the <a href=\"https:\/\/github.com\/ollama\/ollama-python\">Ollama Python library<\/a>. You can run ollama on your own machine with some small-ish model if your hardware supports it, or get a free <a href=\"https:\/\/docs.ollama.com\/cloud\">Ollama Cloud<\/a> account and use an API key to access some models. The free tier should be far more than enough to do this project, even with lots of testing. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Demo<\/h2>\n\n\n\n<p>Here&#8217;s a demo of my version of the project&#8217;s output. <\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">Enter a URL: http:\/\/pizzatoday.com<br><br>--- Summary ---<br>Words: 8161<br><br>Headings: <br>Pizza Today<br>Featured<br>Latest Posts<br>Latest Recipes<br>Latest Podcasts<br>General<br>Topics<br>About Us<br>Contact Us<br><br>Links: <br>Emerald Media Network | https:\/\/emeraldx.com<br>Advertise | https:\/\/pizzatoday.com\/pizza-today-media-kit-download\/<br>International Pizza Expo | https:\/\/pizzaexpo.pizzatoday.com\/<br>Pizza Expo Columbus | https:\/\/pizzaexpocolumbus.pizzatoday.com\/<br>Latest | https:\/\/pizzatoday.com\/latest\/<br>View All Posts \u00bb | https:\/\/pizzatoday.com\/latest\/<br>News | https:\/\/pizzatoday.com\/news\/<br>Press Releases | https:\/\/pizzatoday.com\/press-releases\/<br>Podcasts | https:\/\/pizzatoday.com\/podcasts\/<br>Recipes | https:\/\/pizzatoday.com\/recipes\/<br>Resources | https:\/\/pizzatoday.com\/resources\/<br>[... truncated for space ...]<br><br>Images:<br> | https:\/\/pizzatoday.com\/wp-content\/uploads\/2021\/12\/Pizza_Today_Logo.svg<br> | https:\/\/pizzatoday.com\/wp-content\/uploads\/2021\/12\/Pizza_Today_Logo.svg<br>pizzeria women of influence | https:\/\/pizzatoday.com\/wp-content\/uploads\/2026\/03\/April_WebImgs_-1-1.png<br>Image of Mirko D&amp;#039;Agata, Pizza Maker of the Year 2026. | https:\/\/pizzatoday.com\/wp-content\/uploads\/2026\/03\/Mirko-winner-150x150.jpg<br>image of World Pizza Games area at Pizza Expo 2026 | https:\/\/pizzatoday.com\/wp-content\/uploads\/2026\/03\/World-Pizza-Games-flag-150x150.jpg<br>2026 Pizza Industry Trends Report | https:\/\/pizzatoday.com\/wp-content\/uploads\/2025\/12\/Dec_WebImgs_-13-150x150.jpeg<br>Pizza Today Pizza Styles Guide Featured Image | https:\/\/pizzatoday.com\/wp-content\/uploads\/2025\/06\/Style_Promo_900x600-150x150.png<br>pizzas at a pizza festival | https:\/\/pizzatoday.com\/wp-content\/uploads\/2026\/04\/AdobeStock_646485542.jpeg<br>Image of a vegan ricotta and squash pizza. | https:\/\/pizzatoday.com\/wp-content\/uploads\/2026\/04\/AdobeStock_249637819-resize.jpg<br>[... truncated for space ...]<br><br>Emails:<br><br>Phone Numbers:<br><br>Top words: <br>  67 list<br>  50 pizza<br>  24 news<br>  14 with<br>  13 screen<br>  12 only<br>  10 dough<br>  10 april<br>   9 about<br>   8 view<br><br>--- LLM Summary --- <br>This website appears to be an industry-focused publication about pizza, featuring news, recipes, podcasts, and resources for pizza makers and pizzeria businesses. It also promotes industry events, trends reports, and professional resources related to the pizza and restaurant industry.<\/pre>\n","protected":false},"excerpt":{"rendered":"<p class=\"lead\">Microproject Write a Python program which takes a single argument \u2014 a URL. Your program will use the Unix command curl to download the file at that URL, remove all of the embedded javascript and css from the file, and write the resulting file to the screen. Be sure to test it on a variety of URLs. Main Project Write&hellip;<\/p>\n<p class=\"more-link-p\"><a class=\"btn btn-warning\" href=\"https:\/\/danielschlegel.org\/wp\/teaching\/csc344-spring-2026\/assignment-5-structured-summaries\/\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":5834,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_editorskit_title_hidden":false,"_editorskit_reading_time":0,"_editorskit_is_block_options_detached":false,"_editorskit_block_options_position":"{}","footnotes":""},"class_list":["post-6006","page","type-page","status-publish","hentry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.7 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Assignment 5 - Structured Summaries - Daniel R. Schlegel<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/danielschlegel.org\/wp\/teaching\/csc344-spring-2026\/assignment-5-structured-summaries\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Assignment 5 - Structured Summaries - Daniel R. Schlegel\" \/>\n<meta property=\"og:description\" content=\"Microproject Write a Python program which takes a single argument \u2014 a URL. Your program will use the Unix command curl to download the file at that URL, remove all of the embedded javascript and css from the file, and write the resulting file to the screen. Be sure to test it on a variety of URLs. Main Project Write&hellip;Read more\" \/>\n<meta property=\"og:url\" content=\"https:\/\/danielschlegel.org\/wp\/teaching\/csc344-spring-2026\/assignment-5-structured-summaries\/\" \/>\n<meta property=\"og:site_name\" content=\"Daniel R. Schlegel\" \/>\n<meta property=\"article:modified_time\" content=\"2026-04-23T15:04:47+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/danielschlegel.org\\\/wp\\\/teaching\\\/csc344-spring-2026\\\/assignment-5-structured-summaries\\\/\",\"url\":\"https:\\\/\\\/danielschlegel.org\\\/wp\\\/teaching\\\/csc344-spring-2026\\\/assignment-5-structured-summaries\\\/\",\"name\":\"Assignment 5 - Structured Summaries - Daniel R. Schlegel\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/danielschlegel.org\\\/wp\\\/#website\"},\"datePublished\":\"2026-04-23T14:59:14+00:00\",\"dateModified\":\"2026-04-23T15:04:47+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/danielschlegel.org\\\/wp\\\/teaching\\\/csc344-spring-2026\\\/assignment-5-structured-summaries\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/danielschlegel.org\\\/wp\\\/teaching\\\/csc344-spring-2026\\\/assignment-5-structured-summaries\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/danielschlegel.org\\\/wp\\\/teaching\\\/csc344-spring-2026\\\/assignment-5-structured-summaries\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/danielschlegel.org\\\/wp\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Teaching\",\"item\":\"https:\\\/\\\/danielschlegel.org\\\/wp\\\/teaching\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"CSC344 &#8211; Spring 2026\",\"item\":\"https:\\\/\\\/danielschlegel.org\\\/wp\\\/teaching\\\/csc344-spring-2026\\\/\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"Assignment 5 &#8211; Structured Summaries\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/danielschlegel.org\\\/wp\\\/#website\",\"url\":\"https:\\\/\\\/danielschlegel.org\\\/wp\\\/\",\"name\":\"Daniel R. Schlegel\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/danielschlegel.org\\\/wp\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Assignment 5 - Structured Summaries - Daniel R. Schlegel","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/danielschlegel.org\/wp\/teaching\/csc344-spring-2026\/assignment-5-structured-summaries\/","og_locale":"en_US","og_type":"article","og_title":"Assignment 5 - Structured Summaries - Daniel R. Schlegel","og_description":"Microproject Write a Python program which takes a single argument \u2014 a URL. Your program will use the Unix command curl to download the file at that URL, remove all of the embedded javascript and css from the file, and write the resulting file to the screen. Be sure to test it on a variety of URLs. Main Project Write&hellip;Read more","og_url":"https:\/\/danielschlegel.org\/wp\/teaching\/csc344-spring-2026\/assignment-5-structured-summaries\/","og_site_name":"Daniel R. Schlegel","article_modified_time":"2026-04-23T15:04:47+00:00","twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/danielschlegel.org\/wp\/teaching\/csc344-spring-2026\/assignment-5-structured-summaries\/","url":"https:\/\/danielschlegel.org\/wp\/teaching\/csc344-spring-2026\/assignment-5-structured-summaries\/","name":"Assignment 5 - Structured Summaries - Daniel R. Schlegel","isPartOf":{"@id":"https:\/\/danielschlegel.org\/wp\/#website"},"datePublished":"2026-04-23T14:59:14+00:00","dateModified":"2026-04-23T15:04:47+00:00","breadcrumb":{"@id":"https:\/\/danielschlegel.org\/wp\/teaching\/csc344-spring-2026\/assignment-5-structured-summaries\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/danielschlegel.org\/wp\/teaching\/csc344-spring-2026\/assignment-5-structured-summaries\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/danielschlegel.org\/wp\/teaching\/csc344-spring-2026\/assignment-5-structured-summaries\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/danielschlegel.org\/wp\/"},{"@type":"ListItem","position":2,"name":"Teaching","item":"https:\/\/danielschlegel.org\/wp\/teaching\/"},{"@type":"ListItem","position":3,"name":"CSC344 &#8211; Spring 2026","item":"https:\/\/danielschlegel.org\/wp\/teaching\/csc344-spring-2026\/"},{"@type":"ListItem","position":4,"name":"Assignment 5 &#8211; Structured Summaries"}]},{"@type":"WebSite","@id":"https:\/\/danielschlegel.org\/wp\/#website","url":"https:\/\/danielschlegel.org\/wp\/","name":"Daniel R. Schlegel","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/danielschlegel.org\/wp\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"}]}},"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/P83Tb6-1yS","_links":{"self":[{"href":"https:\/\/danielschlegel.org\/wp\/wp-json\/wp\/v2\/pages\/6006","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/danielschlegel.org\/wp\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/danielschlegel.org\/wp\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/danielschlegel.org\/wp\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/danielschlegel.org\/wp\/wp-json\/wp\/v2\/comments?post=6006"}],"version-history":[{"count":9,"href":"https:\/\/danielschlegel.org\/wp\/wp-json\/wp\/v2\/pages\/6006\/revisions"}],"predecessor-version":[{"id":6015,"href":"https:\/\/danielschlegel.org\/wp\/wp-json\/wp\/v2\/pages\/6006\/revisions\/6015"}],"up":[{"embeddable":true,"href":"https:\/\/danielschlegel.org\/wp\/wp-json\/wp\/v2\/pages\/5834"}],"wp:attachment":[{"href":"https:\/\/danielschlegel.org\/wp\/wp-json\/wp\/v2\/media?parent=6006"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}