<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>DATA 3464: Fundamentals of Data Processing on DATA 3464</title><link>https://mru-data3464.github.io/w26/</link><description>Recent content in DATA 3464: Fundamentals of Data Processing on DATA 3464</description><generator>Hugo</generator><language>en-ca</language><copyright>© Charlotte Curtis 2026 | Created using &lt;a href='https://github.com/alex-shpak/hugo-book'&gt;Hugo Book Theme&lt;/a&gt;</copyright><atom:link href="https://mru-data3464.github.io/w26/index.xml" rel="self" type="application/rss+xml"/><item><title>Lab 10: Image files and processing</title><link>https://mru-data3464.github.io/w26/lab/10-images/</link><pubDate>Mon, 30 Mar 2026 00:00:00 +0000</pubDate><guid>https://mru-data3464.github.io/w26/lab/10-images/</guid><description>&lt;h1 id="hahahugoshortcode13s0hbhb"&gt;Lab 10: Image files and processing&lt;a class="anchor" href="#hahahugoshortcode13s0hbhb"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;h2 id="objective"&gt;Objective&lt;a class="anchor" href="#objective"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Learn about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Images as 2D signals&lt;/li&gt;
&lt;li&gt;Impact of resampling approaches&lt;/li&gt;
&lt;li&gt;Bulk image processing&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="setup"&gt;Setup&lt;a class="anchor" href="#setup"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The lab directory has some starter code and files in the &lt;code&gt;lab10&lt;/code&gt; subdirectory, so merge in the pull request and pull changes to your local computer.&lt;/p&gt;
&lt;h2 id="task-1-resizing"&gt;Task 1: Resizing&lt;a class="anchor" href="#task-1-resizing"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;One of the most common image processing operations is shrinking (or, less commonly, increasing) the number of pixels. Even with libraries that provide convenient &amp;ldquo;resize&amp;rdquo; functions, you need to choose the &lt;a href="https://en.wikipedia.org/wiki/Image_scaling#Algorithms"&gt;resampling algorithm&lt;/a&gt; and the target dimensions.&lt;/p&gt;</description></item><item><title>Lab 9: 1D Signals and Audio</title><link>https://mru-data3464.github.io/w26/lab/09-signals-audio/</link><pubDate>Mon, 23 Mar 2026 00:00:00 +0000</pubDate><guid>https://mru-data3464.github.io/w26/lab/09-signals-audio/</guid><description>&lt;h1 id="hahahugoshortcode12s0hbhb"&gt;Lab 9: 1D Signals and Audio&lt;a class="anchor" href="#hahahugoshortcode12s0hbhb"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;h2 id="objective"&gt;Objective&lt;a class="anchor" href="#objective"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Learn about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Audio as 1D signals&lt;/li&gt;
&lt;li&gt;Time and frequency representations&lt;/li&gt;
&lt;li&gt;Characteristics of audio files&lt;/li&gt;
&lt;li&gt;Basic audio processing&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="setup"&gt;Setup&lt;a class="anchor" href="#setup"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The lab directory has some starter code and files in the &lt;code&gt;lab09&lt;/code&gt; subdirectory, so merge in the pull request and pull changes to your local computer.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve downloaded two files from freesound.org. &lt;strong&gt;Do not listen to them&lt;/strong&gt; - try to figure out which is which by looking at the signals! The files are:&lt;/p&gt;</description></item><item><title>Lab 8: Bash and Regular Expressions</title><link>https://mru-data3464.github.io/w26/lab/08-bash/</link><pubDate>Mon, 16 Mar 2026 00:00:00 +0000</pubDate><guid>https://mru-data3464.github.io/w26/lab/08-bash/</guid><description>&lt;h1 id="hahahugoshortcode11s0hbhb"&gt;Lab 8: Bash and Regular Expressions&lt;a class="anchor" href="#hahahugoshortcode11s0hbhb"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;h2 id="objective"&gt;Objective&lt;a class="anchor" href="#objective"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Learn about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Navigating a computer on the command line&lt;/li&gt;
&lt;li&gt;Basic bash syntax&lt;/li&gt;
&lt;li&gt;Extracting text features with regular expressions&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="setup"&gt;Setup&lt;a class="anchor" href="#setup"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;There is no starter code this week; instead, I&amp;rsquo;d like you to create a &lt;code&gt;lab08&lt;/code&gt; subdirectory and put your files in it.&lt;/p&gt;
&lt;p&gt;While I can&amp;rsquo;t tell &lt;em&gt;how&lt;/em&gt; you actually make a directory, this is a good excuse to practice with bash! Try doing the following:&lt;/p&gt;</description></item><item><title>Assignment 3: Dataset curation</title><link>https://mru-data3464.github.io/w26/assignment/assignment3/</link><pubDate>Wed, 11 Mar 2026 00:00:00 +0000</pubDate><guid>https://mru-data3464.github.io/w26/assignment/assignment3/</guid><description>&lt;h1 id="hahahugoshortcode2s0hbhb"&gt;Assignment 3: Dataset curation&lt;a class="anchor" href="#hahahugoshortcode2s0hbhb"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Due April 2, 2026 (end of day). This is the day before Easter break; presentations are tentatively scheduled for the following Thursday (April 9). As usual, reasonable requests for extensions will be granted.&lt;/p&gt;
&lt;p&gt;You may work in groups up to 3, and I strongly advise working in groups this time! Some of the work is just plain tedious. Click &lt;a href="https://classroom.github.com/a/4iTVhHea"&gt;here&lt;/a&gt; to create your groups on GitHub Classroom and clone your mostly empty repository.&lt;/p&gt;</description></item><item><title>Lab 7: Text Wrangling</title><link>https://mru-data3464.github.io/w26/lab/07-text-wrangling/</link><pubDate>Mon, 09 Mar 2026 00:00:00 +0000</pubDate><guid>https://mru-data3464.github.io/w26/lab/07-text-wrangling/</guid><description>&lt;h1 id="hahahugoshortcode16s0hbhb"&gt;Lab 7: Text Wrangling&lt;a class="anchor" href="#hahahugoshortcode16s0hbhb"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;h2 id="objective"&gt;Objective&lt;a class="anchor" href="#objective"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Learn a bit about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Text files that don&amp;rsquo;t magically import&lt;/li&gt;
&lt;li&gt;Converting from text to datetime and numeric formats&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="setup"&gt;Setup&lt;a class="anchor" href="#setup"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Update your labs repo through the usual method. There is one starter notebook with guiding &lt;code&gt;TODO&lt;/code&gt; comments and some data files.&lt;/p&gt;
&lt;h2 id="your-task"&gt;Your task&lt;a class="anchor" href="#your-task"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;This activity was inspired by a friend who complained that Canada Pension Plan (CPP) contribution maximums have been outpacing inflation (yes, this is very much a first world problem). The &lt;a href="https://www.canada.ca/en/revenue-agency/services/tax/businesses/topics/payroll/payroll-deductions-contributions/canada-pension-plan-cpp/cpp-contribution-rates-maximums-exemptions.html"&gt;CPP website&lt;/a&gt; provides information in tabular form, but it&amp;rsquo;s nice to interpret it visually.&lt;/p&gt;</description></item><item><title>Lab 6: Lab Exam Trial</title><link>https://mru-data3464.github.io/w26/lab/06-lab-exam-trial/</link><pubDate>Mon, 23 Feb 2026 00:00:00 +0000</pubDate><guid>https://mru-data3464.github.io/w26/lab/06-lab-exam-trial/</guid><description>&lt;h1 id="hahahugoshortcode14s0hbhb"&gt;Lab 6: Lab Exam Trial&lt;a class="anchor" href="#hahahugoshortcode14s0hbhb"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Next week (Monday, March 2th) we&amp;rsquo;ll be doing the lab component of the midterm, worth 5-10% of your final grade (whichever gives you a higher overall score with the written midterm at 15-20%). To test out the setup in a low stakes environment, this lab will be conducted in class with the following websites whitelisted:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://numpy.org/"&gt;Numpy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pandas.pydata.org/"&gt;Pandas&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://scikit-learn.org/"&gt;Scikit-learn&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="matplotlib.org/"&gt;Matplotlib&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://seaborn.pydata.org/"&gt;Seaborn&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The usual lab computer environments will be available - VS Code, Jupyter Notebooks, Spyder, etc. There is currently an issue with Python&amp;rsquo;s intellisense extension on VS Code (Pylance), but hopefully this will be resolved before the real thing. In the meantime, enjoy the challenge of writing code without autocomplete.&lt;/p&gt;</description></item><item><title>Assignment 2: Preprocessing Pipelines</title><link>https://mru-data3464.github.io/w26/assignment/assignment2/</link><pubDate>Mon, 09 Feb 2026 00:00:00 +0000</pubDate><guid>https://mru-data3464.github.io/w26/assignment/assignment2/</guid><description>&lt;h1 id="hahahugoshortcode3s0hbhb"&gt;Assignment 2: Preprocessing Pipelines&lt;a class="anchor" href="#hahahugoshortcode3s0hbhb"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Due February 24, 2026 (end of day). Reasonable requests for extensions will be granted. This is, awkwardly, only 2 days before the midterm. I will put a component of this assignment on the lab component of the midterm (March &lt;del&gt;9&lt;/del&gt; 2), so timely submissions will ensure I can give you feedback before then.&lt;/p&gt;
&lt;p&gt;You may work in groups up to 3. Click &lt;a href="https://classroom.github.com/a/w7SFuWc_"&gt;here&lt;/a&gt; to create your groups on GitHub Classroom and clone the starter repository, which should have a csv containing the data and the usual .gitignore file.&lt;/p&gt;</description></item><item><title>Lab 5: Dimensionality Reduction</title><link>https://mru-data3464.github.io/w26/lab/05-dimension-reduction/</link><pubDate>Mon, 09 Feb 2026 00:00:00 +0000</pubDate><guid>https://mru-data3464.github.io/w26/lab/05-dimension-reduction/</guid><description>&lt;h1 id="hahahugoshortcode10s0hbhb"&gt;Lab 5: Dimensionality Reduction&lt;a class="anchor" href="#hahahugoshortcode10s0hbhb"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;h2 id="objective"&gt;Objective&lt;a class="anchor" href="#objective"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Learn a bit about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Dimensionality reduction&lt;/li&gt;
&lt;li&gt;The differences between PCA and LDA&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="setup"&gt;Setup&lt;a class="anchor" href="#setup"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Update your labs repo through the usual method. There is one starter notebook that just fetches the dataset and has some guiding &lt;code&gt;TODO&lt;/code&gt; comments.&lt;/p&gt;
&lt;h2 id="the-dataset"&gt;The dataset&lt;a class="anchor" href="#the-dataset"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;For the purposes of this exercise, we&amp;rsquo;ll use another well-behaved dataset: the &lt;a href="https://archive.ics.uci.edu/dataset/17/breast&amp;#43;cancer&amp;#43;wisconsin&amp;#43;diagnostic"&gt;Winsconsin Breast Cancer Diagnostics&lt;/a&gt; dataset. This has a set of 30 measurements taken from images of histology slides of breast mass biopsies (the features) that can be used to predict whether the mass is benign (0) or malignant (1). You can read the original paper &lt;a href="https://minds.wisconsin.edu/bitstream/handle/1793/59692/TR1131.pdf;jsessionid=9CE7F71509ADFACCE36BBE2F8E499E42?sequence=1"&gt;here&lt;/a&gt; if you&amp;rsquo;re into that sort of thing.&lt;/p&gt;</description></item><item><title>Lab 4: Numeric Data Transformations</title><link>https://mru-data3464.github.io/w26/lab/04-pipelines/</link><pubDate>Fri, 30 Jan 2026 18:03:30 -0700</pubDate><guid>https://mru-data3464.github.io/w26/lab/04-pipelines/</guid><description>&lt;h1 id="hahahugoshortcode9s0hbhb"&gt;Lab 4: Numeric Data Transformations&lt;a class="anchor" href="#hahahugoshortcode9s0hbhb"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;h2 id="objective"&gt;Objective&lt;a class="anchor" href="#objective"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Learn a bit about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Transforming numeric data&lt;/li&gt;
&lt;li&gt;Building preprocessing pipelines&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Since you&amp;rsquo;ve already done a lot of painful wrangling in your first assignment to combine data into a useful tabular form, I&amp;rsquo;ve done this bit for you in this lab. We&amp;rsquo;re also going to use the same (sort of) housing assessment data from &lt;a href="https://mru-data3464.github.io/w26/lab/02-fetch-and-explore"&gt;lab 2&lt;/a&gt;, so the dataset should be familiar.&lt;/p&gt;
&lt;blockquote class='book-hint '&gt;
&lt;p&gt;If you are unable to fetch data from the City:
I&amp;rsquo;ve put a copy of a (200MB+) CSV version at &amp;ldquo;I:\Labs\CompSci\Resources\DATA 3464\housing_data_pre_split.csv&amp;rdquo;, accessible either through the lab computers or WebFiles at gp.mtroyal.ca&lt;/p&gt;</description></item><item><title>Assignment 1: Exploratory Data Analysis</title><link>https://mru-data3464.github.io/w26/assignment/assignment1/</link><pubDate>Thu, 15 Jan 2026 00:00:00 +0000</pubDate><guid>https://mru-data3464.github.io/w26/assignment/assignment1/</guid><description>&lt;h1 id="hahahugoshortcode4s0hbhb"&gt;Assignment 1: Exploratory Data Analysis&lt;a class="anchor" href="#hahahugoshortcode4s0hbhb"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Due January 30, 2026 at 5 pm. Reasonable requests for extensions will be granted.&lt;/p&gt;
&lt;p&gt;You may work in groups up to 3. Click &lt;a href="https://classroom.github.com/a/BC8a9Mj2"&gt;here&lt;/a&gt; to create your groups on GitHub Classroom and clone the starter repository, which is pretty much empty aside from a &lt;code&gt;.gitignore&lt;/code&gt; file. This should allow everyone in the group to make changes, but merge conflicts may occur, particularly if two people are editing the same file.&lt;/p&gt;</description></item><item><title>12: Data Labelling and Augmentation</title><link>https://mru-data3464.github.io/w26/lecture/12-augmentation/</link><pubDate>Thu, 02 Apr 2026 00:00:00 +0000</pubDate><guid>https://mru-data3464.github.io/w26/lecture/12-augmentation/</guid><description>&lt;!-- 
_class: title_slide
_paginate: skip
--&gt;
&lt;link rel="stylesheet" href="https://mru-data3464.github.io/w26/katex/katex.min.css" /&gt;&lt;script defer src="https://mru-data3464.github.io/w26/katex/katex.min.js"&gt;&lt;/script&gt;&lt;script defer src="https://mru-data3464.github.io/w26/katex/auto-render.min.js" onload="renderMathInElement(document.body, {&amp;#34;delimiters&amp;#34;:[{&amp;#34;left&amp;#34;:&amp;#34;$$&amp;#34;,&amp;#34;right&amp;#34;:&amp;#34;$$&amp;#34;,&amp;#34;display&amp;#34;:true},{&amp;#34;left&amp;#34;:&amp;#34;$&amp;#34;,&amp;#34;right&amp;#34;:&amp;#34;$&amp;#34;,&amp;#34;display&amp;#34;:false},{&amp;#34;left&amp;#34;:&amp;#34;\\(&amp;#34;,&amp;#34;right&amp;#34;:&amp;#34;\\)&amp;#34;,&amp;#34;display&amp;#34;:false},{&amp;#34;left&amp;#34;:&amp;#34;\\[&amp;#34;,&amp;#34;right&amp;#34;:&amp;#34;\\]&amp;#34;,&amp;#34;display&amp;#34;:true}]});"&gt;&lt;/script&gt;
&lt;h2 id="topic-overview"&gt;Topic overview&lt;a class="anchor" href="#topic-overview"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Tools for fancy labelling&lt;/li&gt;
&lt;li&gt;Annotation conventions&lt;/li&gt;
&lt;li&gt;Augmenting data&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Resources used:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://labelstud.io/"&gt;Label Studio&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://labelformat.com/formats/object-detection/"&gt;Labelformat docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cocodataset.org/#format-data"&gt;Coco Dataset Description&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="computer-vision-tasks"&gt;Computer vision tasks&lt;a class="anchor" href="#computer-vision-tasks"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;img src="../../static/img/12-coco_instance_segmentation.jpeg" alt="" /&gt;&lt;/p&gt;
&lt;footer&gt;Image from https://manipulation.csail.mit.edu/segmentation.html&lt;/footer&gt;
&lt;h2 id="how-are-annotations-stored"&gt;How are annotations stored?&lt;a class="anchor" href="#how-are-annotations-stored"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;!-- _class: code_reminder --&gt;
&lt;p&gt;Usually in plain text!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Classification: subdirectories, csv files&lt;/li&gt;
&lt;li&gt;Bounding boxes: text files, e.g. COCO, VOC&lt;/li&gt;
&lt;li&gt;Segmentations: Text and/or PNG, e.g. &lt;a href="https://cocodataset.org/#format-data"&gt;COCO&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote class='book-hint '&gt;
&lt;p&gt;Not a lot of formal process here - someone builds something for their purposes, others find it useful, variations abound.&lt;/p&gt;</description></item><item><title>11: Image Processing</title><link>https://mru-data3464.github.io/w26/lecture/11-images/</link><pubDate>Tue, 24 Mar 2026 00:00:00 +0000</pubDate><guid>https://mru-data3464.github.io/w26/lecture/11-images/</guid><description>&lt;!-- 
_class: title_slide
_paginate: skip
--&gt;
&lt;link rel="stylesheet" href="https://mru-data3464.github.io/w26/katex/katex.min.css" /&gt;&lt;script defer src="https://mru-data3464.github.io/w26/katex/katex.min.js"&gt;&lt;/script&gt;&lt;script defer src="https://mru-data3464.github.io/w26/katex/auto-render.min.js" onload="renderMathInElement(document.body, {&amp;#34;delimiters&amp;#34;:[{&amp;#34;left&amp;#34;:&amp;#34;$$&amp;#34;,&amp;#34;right&amp;#34;:&amp;#34;$$&amp;#34;,&amp;#34;display&amp;#34;:true},{&amp;#34;left&amp;#34;:&amp;#34;$&amp;#34;,&amp;#34;right&amp;#34;:&amp;#34;$&amp;#34;,&amp;#34;display&amp;#34;:false},{&amp;#34;left&amp;#34;:&amp;#34;\\(&amp;#34;,&amp;#34;right&amp;#34;:&amp;#34;\\)&amp;#34;,&amp;#34;display&amp;#34;:false},{&amp;#34;left&amp;#34;:&amp;#34;\\[&amp;#34;,&amp;#34;right&amp;#34;:&amp;#34;\\]&amp;#34;,&amp;#34;display&amp;#34;:true}]});"&gt;&lt;/script&gt;
&lt;h2 id="topic-overview"&gt;Topic overview&lt;a class="anchor" href="#topic-overview"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Image representation&lt;/li&gt;
&lt;li&gt;File formats and compression&lt;/li&gt;
&lt;li&gt;Preprocessing for image recognition tasks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Resources used:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://szeliski.org/Book/"&gt;Computer Vision: Algorithms and Applications (2nd edition)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pillow.readthedocs.io/en/stable/index.html"&gt;Pillow Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://rosswoleben.com/projects/image-compression"&gt;A blog post on image compression&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cgjennings.ca/articles/jpeg-compression/"&gt;JPEG demo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="images-as-2d-signals"&gt;Images as 2D signals&lt;a class="anchor" href="#images-as-2d-signals"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;div class="columns"&gt;
&lt;ul&gt;
&lt;li&gt;The light that enters a camera can be modelled as continuous signal:
$$f(x, y),\space -\infty &amp;lt; x, y &amp;lt; \infty$$&lt;/li&gt;
&lt;li&gt;Digital images are sampled:
$$f[n, m], \space n = n \Delta_x, m = n \Delta_y$$
where typically $\Delta_x = \Delta_y$&lt;/li&gt;
&lt;li&gt;The area $\Delta_x \times \Delta_y$ is called a &lt;strong&gt;pic&lt;/strong&gt;ture &lt;strong&gt;el&lt;/strong&gt;ement, or &lt;strong&gt;pixel&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src="../../static/img/11-image-formation.png" alt="" /&gt;&lt;/p&gt;</description></item><item><title>10: Signals and Audio</title><link>https://mru-data3464.github.io/w26/lecture/10-signals-audio/</link><pubDate>Tue, 17 Mar 2026 00:00:00 +0000</pubDate><guid>https://mru-data3464.github.io/w26/lecture/10-signals-audio/</guid><description>&lt;!-- 
_class: title_slide
_paginate: skip
--&gt;
&lt;link rel="stylesheet" href="https://mru-data3464.github.io/w26/katex/katex.min.css" /&gt;&lt;script defer src="https://mru-data3464.github.io/w26/katex/katex.min.js"&gt;&lt;/script&gt;&lt;script defer src="https://mru-data3464.github.io/w26/katex/auto-render.min.js" onload="renderMathInElement(document.body, {&amp;#34;delimiters&amp;#34;:[{&amp;#34;left&amp;#34;:&amp;#34;$$&amp;#34;,&amp;#34;right&amp;#34;:&amp;#34;$$&amp;#34;,&amp;#34;display&amp;#34;:true},{&amp;#34;left&amp;#34;:&amp;#34;$&amp;#34;,&amp;#34;right&amp;#34;:&amp;#34;$&amp;#34;,&amp;#34;display&amp;#34;:false},{&amp;#34;left&amp;#34;:&amp;#34;\\(&amp;#34;,&amp;#34;right&amp;#34;:&amp;#34;\\)&amp;#34;,&amp;#34;display&amp;#34;:false},{&amp;#34;left&amp;#34;:&amp;#34;\\[&amp;#34;,&amp;#34;right&amp;#34;:&amp;#34;\\]&amp;#34;,&amp;#34;display&amp;#34;:true}]});"&gt;&lt;/script&gt;
&lt;h2 id="topic-overview"&gt;Topic overview&lt;a class="anchor" href="#topic-overview"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Introduction to signals&lt;/li&gt;
&lt;li&gt;Audio as a 1D signal&lt;/li&gt;
&lt;li&gt;File formats&lt;/li&gt;
&lt;li&gt;A brief intro to signal processing&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Resources used:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Various textbooks from my undergrad&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dspguide.com/"&gt;DSPguide.com&lt;/a&gt; seems like a pretty good resource&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="what-is-a-signal"&gt;What is a signal?&lt;a class="anchor" href="#what-is-a-signal"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;!-- _class: code_reminder --&gt;
&lt;blockquote class='book-hint '&gt;
&lt;p&gt;&amp;ldquo;A [continuous/discrete] signal is a function of independent variables that range over [a continuum/discrete] values&amp;rdquo; - Jerry L. Prince, Medical Imaging Signals and Systems&lt;/p&gt;</description></item><item><title>9: Bash and data cards</title><link>https://mru-data3464.github.io/w26/lecture/09-bash-cards/</link><pubDate>Thu, 12 Mar 2026 00:00:00 +0000</pubDate><guid>https://mru-data3464.github.io/w26/lecture/09-bash-cards/</guid><description>&lt;!-- 
_class: title_slide
_paginate: skip
--&gt;
&lt;link rel="stylesheet" href="https://mru-data3464.github.io/w26/katex/katex.min.css" /&gt;&lt;script defer src="https://mru-data3464.github.io/w26/katex/katex.min.js"&gt;&lt;/script&gt;&lt;script defer src="https://mru-data3464.github.io/w26/katex/auto-render.min.js" onload="renderMathInElement(document.body, {&amp;#34;delimiters&amp;#34;:[{&amp;#34;left&amp;#34;:&amp;#34;$$&amp;#34;,&amp;#34;right&amp;#34;:&amp;#34;$$&amp;#34;,&amp;#34;display&amp;#34;:true},{&amp;#34;left&amp;#34;:&amp;#34;$&amp;#34;,&amp;#34;right&amp;#34;:&amp;#34;$&amp;#34;,&amp;#34;display&amp;#34;:false},{&amp;#34;left&amp;#34;:&amp;#34;\\(&amp;#34;,&amp;#34;right&amp;#34;:&amp;#34;\\)&amp;#34;,&amp;#34;display&amp;#34;:false},{&amp;#34;left&amp;#34;:&amp;#34;\\[&amp;#34;,&amp;#34;right&amp;#34;:&amp;#34;\\]&amp;#34;,&amp;#34;display&amp;#34;:true}]});"&gt;&lt;/script&gt;
&lt;h2 id="topic-overview"&gt;Topic overview&lt;a class="anchor" href="#topic-overview"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Intro to bash and unix tools&lt;/li&gt;
&lt;li&gt;Data cards&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Resources used:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://jeroenjanssens.com/dsatcl/"&gt;Data Science at the Command Line&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://sites.research.google/datacardsplaybook/"&gt;The Data Cards Playbook&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote class='book-hint '&gt;
&lt;p&gt;Why these topics together? They both seem relevant to assignment 3&lt;/p&gt;&lt;/blockquote&gt;&lt;h2 id="why-bash"&gt;Why bash?&lt;a class="anchor" href="#why-bash"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Python is great, but it&amp;rsquo;s not the only tool. Command line is better to:
&lt;ul&gt;
&lt;li&gt;Move and rename files&lt;/li&gt;
&lt;li&gt;Take a peek at the first few lines of a giant csv&lt;/li&gt;
&lt;li&gt;Find and replace text in a bunch of files&lt;/li&gt;
&lt;li&gt;Fetch data from the web&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Bash and unix tools are more useful than their Windows counterparts&lt;/li&gt;
&lt;li&gt;&lt;a href="https://git-scm.com/downloads"&gt;Git bash&lt;/a&gt; lets us use bash on Windows&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="okay-but-what-is-it"&gt;Okay, but what is it?&lt;a class="anchor" href="#okay-but-what-is-it"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;a href="https://www.gnu.org/software/bash/"&gt;Bourne Again SHell&lt;/a&gt; is a command line interface created in 1989&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;shell&lt;/strong&gt; is a program to execute commands from the user&lt;/li&gt;
&lt;li&gt;The &amp;ldquo;Bourne shell&amp;rdquo; (&lt;code&gt;sh&lt;/code&gt;) from the 1970s was standard on UNIX, bash adds to it&lt;/li&gt;
&lt;li&gt;Default shell for linux, (almost) macOS&lt;/li&gt;
&lt;li&gt;When you run these commands in Jupyter notebook, you&amp;rsquo;re using bash!
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;%pip install -q some_package&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="basic-bash"&gt;Basic bash&lt;a class="anchor" href="#basic-bash"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;command_name --long_flag -l -o arg1 arg2&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;ul&gt;
&lt;li&gt;Examples of common commands:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ls&lt;/code&gt; to list files in a directory&lt;/li&gt;
&lt;li&gt;&lt;code&gt;cd&lt;/code&gt; to change directories&lt;/li&gt;
&lt;li&gt;&lt;code&gt;mv&lt;/code&gt; to move or rename files&lt;/li&gt;
&lt;li&gt;&lt;code&gt;head&lt;/code&gt; to view the first few lines of a file&lt;/li&gt;
&lt;li&gt;&lt;code&gt;grep&lt;/code&gt; to search for text in files&lt;/li&gt;
&lt;li&gt;&lt;code&gt;sed&lt;/code&gt; to find and replace text in files&lt;/li&gt;
&lt;li&gt;&lt;code&gt;curl&lt;/code&gt; to fetch data from the web&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="variables"&gt;Variables&lt;a class="anchor" href="#variables"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Variables are defined with &lt;code&gt;=&lt;/code&gt; and accessed with &lt;code&gt;$&lt;/code&gt;&lt;/p&gt;</description></item><item><title>8: Wrangling text</title><link>https://mru-data3464.github.io/w26/lecture/08-text/</link><pubDate>Thu, 05 Mar 2026 00:00:00 +0000</pubDate><guid>https://mru-data3464.github.io/w26/lecture/08-text/</guid><description>&lt;!-- 
_class: title_slide
_paginate: skip
--&gt;

&lt;h2 id="topic-overview"&gt;Topic overview&lt;a class="anchor" href="#topic-overview"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Text file formats and character encodings&lt;/li&gt;
&lt;li&gt;Extracting features from (un)structured text&lt;/li&gt;
&lt;li&gt;Regular expressions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Resources used:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.python.org/3/library/codecs.html#encodings-and-unicode"&gt;Python documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/"&gt;Joel on Software Blog Post&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="what-is-a-text-file"&gt;What is a text file?&lt;a class="anchor" href="#what-is-a-text-file"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;!-- _class: code_reminder --&gt;
&lt;ul&gt;
&lt;li&gt;Any file is just a sequence of &lt;strong&gt;bytes&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;The file &lt;strong&gt;extension&lt;/strong&gt; is somewhat meaningless&lt;/li&gt;
&lt;li&gt;Text files contain only human-readable &lt;strong&gt;characters&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Binary files&lt;/strong&gt; are everything else, including:
&lt;ul&gt;
&lt;li&gt;Images&lt;/li&gt;
&lt;li&gt;PDFs&lt;/li&gt;
&lt;li&gt;Word documents*&lt;/li&gt;
&lt;li&gt;Executables&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;footer&gt;Technically word docs are just zip files with XML (structured text) inside&lt;/footer&gt;
&lt;h2 id="character-encodings"&gt;Character encodings&lt;a class="anchor" href="#character-encodings"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;!-- _class: code_reminder --&gt;
&lt;p&gt;&lt;img src="../../static/img/08-characters.png" alt="center" /&gt;&lt;/p&gt;</description></item><item><title>7: Interaction effects</title><link>https://mru-data3464.github.io/w26/lecture/07-interactions/</link><pubDate>Thu, 12 Feb 2026 00:00:00 +0000</pubDate><guid>https://mru-data3464.github.io/w26/lecture/07-interactions/</guid><description>&lt;!-- 
_class: title_slide
_paginate: skip
--&gt;
&lt;link rel="stylesheet" href="https://mru-data3464.github.io/w26/katex/katex.min.css" /&gt;&lt;script defer src="https://mru-data3464.github.io/w26/katex/katex.min.js"&gt;&lt;/script&gt;&lt;script defer src="https://mru-data3464.github.io/w26/katex/auto-render.min.js" onload="renderMathInElement(document.body, {&amp;#34;delimiters&amp;#34;:[{&amp;#34;left&amp;#34;:&amp;#34;$$&amp;#34;,&amp;#34;right&amp;#34;:&amp;#34;$$&amp;#34;,&amp;#34;display&amp;#34;:true},{&amp;#34;left&amp;#34;:&amp;#34;$&amp;#34;,&amp;#34;right&amp;#34;:&amp;#34;$&amp;#34;,&amp;#34;display&amp;#34;:false},{&amp;#34;left&amp;#34;:&amp;#34;\\(&amp;#34;,&amp;#34;right&amp;#34;:&amp;#34;\\)&amp;#34;,&amp;#34;display&amp;#34;:false},{&amp;#34;left&amp;#34;:&amp;#34;\\[&amp;#34;,&amp;#34;right&amp;#34;:&amp;#34;\\]&amp;#34;,&amp;#34;display&amp;#34;:true}]});"&gt;&lt;/script&gt;
&lt;h2 id="topic-overview"&gt;Topic overview&lt;a class="anchor" href="#topic-overview"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Definitions and description of interaction effects&lt;/li&gt;
&lt;li&gt;Detecting interaction effects&lt;/li&gt;
&lt;li&gt;A brief discussion on feature selection&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Resources used:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.feat.engineering/detecting-interaction-effects"&gt;Feature Engineering Chapter 7&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.feat.engineering/selection"&gt;Feature Engineering Chapter 10&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="definition"&gt;Definition&lt;a class="anchor" href="#definition"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;blockquote class='book-hint '&gt;
&lt;p&gt;Two or more predictors are said to interact if their combined effect is different (less or greater) than what we would expect if we were to add the impact of each of their effects when considered alone. &amp;ndash; Feature Engineering, Ch 7&lt;/p&gt;</description></item><item><title>6: Missing and weird data</title><link>https://mru-data3464.github.io/w26/lecture/06-missing-and-weird-data/</link><pubDate>Thu, 05 Feb 2026 00:00:00 +0000</pubDate><guid>https://mru-data3464.github.io/w26/lecture/06-missing-and-weird-data/</guid><description>&lt;!-- 
_class: title_slide
_paginate: skip
--&gt;

&lt;h2 id="topic-overview"&gt;Topic overview&lt;a class="anchor" href="#topic-overview"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;What to do with missing data&lt;/li&gt;
&lt;li&gt;Detecting and handling outliers&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Resources used:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.feat.engineering/handling-missing-data"&gt;Feature Engineering Chapter 8&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hands on Machine Learning with Scikit-Learn and Tensorflow/PyTorch, Chapter 2. Available at &lt;a href="https://ebookcentral.proquest.com/lib/mtroyal-ebooks/detail.action?docID=30168989"&gt;MRU Library&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://scikit-learn.org/stable/user_guide.html"&gt;Scikit-learn user guide: Chapters 2 and 7&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="the-problem"&gt;The problem&lt;a class="anchor" href="#the-problem"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;As you&amp;rsquo;ve seen, real-world data is messy&lt;/li&gt;
&lt;li&gt;Missing values are common, other values don&amp;rsquo;t make sense&lt;/li&gt;
&lt;li&gt;We need to decide how to deal with these problems&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote class='book-hint '&gt;
&lt;p&gt;What examples have we seen so far?
Why might data be missing or weird*?&lt;/p&gt;</description></item><item><title>5: Numeric transformations</title><link>https://mru-data3464.github.io/w26/lecture/05-numeric-transformations/</link><pubDate>Tue, 27 Jan 2026 00:00:00 +0000</pubDate><guid>https://mru-data3464.github.io/w26/lecture/05-numeric-transformations/</guid><description>&lt;!-- 
_class: title_slide
_paginate: skip
--&gt;
&lt;link rel="stylesheet" href="https://mru-data3464.github.io/w26/katex/katex.min.css" /&gt;&lt;script defer src="https://mru-data3464.github.io/w26/katex/katex.min.js"&gt;&lt;/script&gt;&lt;script defer src="https://mru-data3464.github.io/w26/katex/auto-render.min.js" onload="renderMathInElement(document.body, {&amp;#34;delimiters&amp;#34;:[{&amp;#34;left&amp;#34;:&amp;#34;$$&amp;#34;,&amp;#34;right&amp;#34;:&amp;#34;$$&amp;#34;,&amp;#34;display&amp;#34;:true},{&amp;#34;left&amp;#34;:&amp;#34;$&amp;#34;,&amp;#34;right&amp;#34;:&amp;#34;$&amp;#34;,&amp;#34;display&amp;#34;:false},{&amp;#34;left&amp;#34;:&amp;#34;\\(&amp;#34;,&amp;#34;right&amp;#34;:&amp;#34;\\)&amp;#34;,&amp;#34;display&amp;#34;:false},{&amp;#34;left&amp;#34;:&amp;#34;\\[&amp;#34;,&amp;#34;right&amp;#34;:&amp;#34;\\]&amp;#34;,&amp;#34;display&amp;#34;:true}]});"&gt;&lt;/script&gt;
&lt;h2 id="topic-overview"&gt;Topic overview&lt;a class="anchor" href="#topic-overview"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Why transformations are necessary&lt;/li&gt;
&lt;li&gt;Common transformations&lt;/li&gt;
&lt;li&gt;Dimensionality reduction&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Resources used:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.feat.engineering/engineering-numeric-predictors"&gt;Feature Engineering Chapter 6&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hands on Machine Learning with Scikit-Learn and Tensorflow/PyTorch, Chapter 4. Available at &lt;a href="https://ebookcentral.proquest.com/lib/mtroyal-ebooks/detail.action?docID=30168989"&gt;MRU Library&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://scikit-learn.org/stable/data_transforms.html"&gt;Scikit-learn user guide: Chapter 7&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Introduction to Machine Learning with Python. Available at &lt;a href="https://librarysearch.mtroyal.ca/permalink/01MTROYAL_INST/1qa1aqk/cdi_overdrive_books_ODN0002976888"&gt;MRU Library&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="common-11-transformations"&gt;Common 1:1 transformations&lt;a class="anchor" href="#common-11-transformations"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;blockquote class='book-hint '&gt;
&lt;p&gt;&amp;ldquo;Most models work best when each feature (and in regression also the target) is loosely Gaussian distributed&amp;rdquo; &amp;ndash; Introduction to Machine Learning with Python&lt;/p&gt;</description></item><item><title>Lab 3: Basic supervised models</title><link>https://mru-data3464.github.io/w26/lab/03-basic-models/</link><pubDate>Mon, 26 Jan 2026 00:00:00 +0000</pubDate><guid>https://mru-data3464.github.io/w26/lab/03-basic-models/</guid><description>&lt;h1 id="hahahugoshortcode8s0hbhb"&gt;Lab 3: Basic supervised models&lt;a class="anchor" href="#hahahugoshortcode8s0hbhb"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;h2 id="objective"&gt;Objective&lt;a class="anchor" href="#objective"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Learn a bit about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Different processing needs for different model types&lt;/li&gt;
&lt;li&gt;The need for validation data&lt;/li&gt;
&lt;li&gt;The overall iterative workflow of making predictions from data&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As we haven&amp;rsquo;t yet talked about feature engineering and numeric data transformations, the focus will be on the categorical data.&lt;/p&gt;
&lt;h2 id="setup"&gt;Setup&lt;a class="anchor" href="#setup"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Eventually I&amp;rsquo;m going to stop posting this, but just like last week, update your labs repo by opening it on GitHub and following the instructions in the readme file. This should result in the &lt;code&gt;lab03&lt;/code&gt; folder with two csvs being added. To download to your local machine, you may need to &lt;strong&gt;commit all local changes&lt;/strong&gt; first and then run &lt;code&gt;git pull&lt;/code&gt;.&lt;/p&gt;</description></item><item><title>4: Categorical data</title><link>https://mru-data3464.github.io/w26/lecture/04-categorical/</link><pubDate>Tue, 20 Jan 2026 00:00:00 +0000</pubDate><guid>https://mru-data3464.github.io/w26/lecture/04-categorical/</guid><description>&lt;!-- 
_class: title_slide
_paginate: skip
--&gt;

&lt;h2 id="topic-overview"&gt;Topic overview&lt;a class="anchor" href="#topic-overview"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Exploring categorical data&lt;/li&gt;
&lt;li&gt;Categorical data encoding strategies&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Resources used:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.feat.engineering/encoding-categorical-predictors"&gt;Feature Engineering Chapter 5&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features"&gt;Scikit-learn User Guide (7.3)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="what-is-categorical-data"&gt;What is categorical data?&lt;a class="anchor" href="#what-is-categorical-data"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Samples can take on one of several discrete values or groups
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Nominal&lt;/strong&gt;: no particular order to the groups&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ordinal&lt;/strong&gt;: groups relate to each other in a specific order&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Categories can be represented as strings &lt;em&gt;or&lt;/em&gt; numeric types
&lt;ul&gt;
&lt;li&gt;Domain knowledge is necessary!&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote class='book-hint '&gt;
&lt;p&gt;Let&amp;rsquo;s take a few minutes to brainstorm some examples&lt;/p&gt;</description></item><item><title>Lab 2: Fetching and exploring data</title><link>https://mru-data3464.github.io/w26/lab/02-fetch-and-explore/</link><pubDate>Mon, 19 Jan 2026 00:00:00 +0000</pubDate><guid>https://mru-data3464.github.io/w26/lab/02-fetch-and-explore/</guid><description>&lt;h1 id="hahahugoshortcode7s0hbhb"&gt;Lab 2: Fetching and exploring data&lt;a class="anchor" href="#hahahugoshortcode7s0hbhb"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;h2 id="objective"&gt;Objective&lt;a class="anchor" href="#objective"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;ve prepared a small exercise to work with City of Calgary data. This is relevant to your first assignment, and is an example of the widely-used &lt;a href="https://en.wikipedia.org/wiki/OAuth"&gt;OAuth&lt;/a&gt; protocol. Admittedly, I only understand enough of this protocol to be somewhat dangerous.&lt;/p&gt;
&lt;h2 id="setup"&gt;Setup&lt;a class="anchor" href="#setup"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;First, update your labs repo by opening it on GitHub and following the instructions in the readme file. This should result in the &lt;code&gt;lab02&lt;/code&gt; folder with starter code being added. To download to your local machine, you may need to &lt;strong&gt;commit all local changes&lt;/strong&gt; first and then run &lt;code&gt;git pull&lt;/code&gt;.&lt;/p&gt;</description></item><item><title>3. Exploratory Data Analysis</title><link>https://mru-data3464.github.io/w26/lecture/03-eda/</link><pubDate>Thu, 15 Jan 2026 00:00:00 +0000</pubDate><guid>https://mru-data3464.github.io/w26/lecture/03-eda/</guid><description>&lt;!-- 
_class: title_slide
_paginate: skip
--&gt;

&lt;h2 id="topic-overview"&gt;Topic overview&lt;a class="anchor" href="#topic-overview"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Exploratory data analysis (EDA)&lt;/li&gt;
&lt;li&gt;Splitting your data&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Resources used&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.feat.engineering/review-predictive-modeling-process"&gt;Feat.Engineering Chapter 3&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://r4ds.hadley.nz/EDA.html"&gt;R for Data Science (2e), Chapter 10&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hands on Machine Learning with Scikit-Learn and Tensorflow/PyTorch, Chapter 2. Available at &lt;a href="https://ebookcentral.proquest.com/lib/mtroyal-ebooks/detail.action?docID=30168989"&gt;MRU Library&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="exploratory-data-analysis"&gt;Exploratory data analysis&lt;a class="anchor" href="#exploratory-data-analysis"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The goal of EDA is to &lt;strong&gt;Understand your data&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Ask questions (e.g. are my data normally distributed?)&lt;/li&gt;
&lt;li&gt;Look for answers (e.g. by making histograms)&lt;/li&gt;
&lt;li&gt;Find more questions and return to step 1 (hmm, those are some weird numbers, what do these values represent?)&lt;/li&gt;
&lt;/ol&gt;
&lt;blockquote class='book-hint '&gt;
&lt;p&gt;EDA is not a formal process with a strict set of rules. More than anything, EDA is a state of mind. During the initial phases of EDA you should feel free to investigate every idea that occurs to you. Some of these ideas will pan out, and some will be dead ends. &amp;ndash; Hadley Wickham&lt;/p&gt;</description></item><item><title>2. Basic machine learning models</title><link>https://mru-data3464.github.io/w26/lecture/02-basic-models/</link><pubDate>Tue, 13 Jan 2026 00:00:00 +0000</pubDate><guid>https://mru-data3464.github.io/w26/lecture/02-basic-models/</guid><description>&lt;!-- 
_class: title_slide
_paginate: skip
--&gt;

&lt;h2 id="topic-overview"&gt;Topic overview&lt;a class="anchor" href="#topic-overview"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Some common machine learning tasks and models&lt;/li&gt;
&lt;li&gt;Evaluating model performance&lt;/li&gt;
&lt;li&gt;Limitations and assumptions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Resources used&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.feat.engineering/important-concepts"&gt;Feature Engineering Chapter 1&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Introduction to Machine Learning with Python. Available at &lt;a href="https://librarysearch.mtroyal.ca/permalink/01MTROYAL_INST/1qa1aqk/cdi_overdrive_books_ODN0002976888"&gt;MRU Library&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://scikit-learn.org/stable/user_guide.html"&gt;Scikit-learn User Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hands on Machine Learning with Scikit-Learn and Tensorflow/PyTorch. Available at &lt;a href="https://ebookcentral.proquest.com/lib/mtroyal-ebooks/detail.action?docID=30168989"&gt;MRU Library&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="machine-learning"&gt;Machine learning&lt;a class="anchor" href="#machine-learning"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;To appropriately process the data, we need to know &lt;em&gt;why&lt;/em&gt; we are doing it and what assumptions we&amp;rsquo;re making&lt;/li&gt;
&lt;li&gt;Modern machine learning toolkits (such as &lt;a href="https://scikit-learn.org/stable/"&gt;scikit-learn&lt;/a&gt;) are so easy to use, they&amp;rsquo;re easy to use &lt;a href="https://www.cell.com/patterns/fulltext/S2666-3899%2823%2900159-9"&gt;inappropriately&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Goal: just enough understanding to use basic models &lt;strong&gt;responsibly&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="why-are-we-processing-data"&gt;Why are we processing data?&lt;a class="anchor" href="#why-are-we-processing-data"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;img src="../../static/img/02-model-selection.svg" alt="" /&gt;&lt;/p&gt;</description></item><item><title>Lab 1: Regression with Clean Data</title><link>https://mru-data3464.github.io/w26/lab/01-clean-regression/</link><pubDate>Mon, 12 Jan 2026 00:00:00 +0000</pubDate><guid>https://mru-data3464.github.io/w26/lab/01-clean-regression/</guid><description>&lt;h1 id="lab-1-regression-with-clean-data"&gt;Lab 1: Regression with Clean Data&lt;a class="anchor" href="#lab-1-regression-with-clean-data"&gt;#&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Due date: January 19, 2026 (next lab session)&lt;/p&gt;
&lt;h2 id="objective"&gt;Objective&lt;a class="anchor" href="#objective"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The main goal of this exercise is for me to learn about your current habits and knowledge. It also serves as an example for why we need &amp;ldquo;clean&amp;rdquo; and easy to manage data.&lt;/p&gt;
&lt;h2 id="deliverables"&gt;Deliverables&lt;a class="anchor" href="#deliverables"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://classroom.github.com/a/nfqoN057"&gt;Click here&lt;/a&gt; to join the GitHub classroom and clone the &amp;ldquo;starter code&amp;rdquo;. There&amp;rsquo;s not much in here other than the data sets, so you&amp;rsquo;ll need to add:&lt;/p&gt;</description></item><item><title>1. Introduction</title><link>https://mru-data3464.github.io/w26/lecture/01-intro/</link><pubDate>Tue, 06 Jan 2026 00:00:00 +0000</pubDate><guid>https://mru-data3464.github.io/w26/lecture/01-intro/</guid><description>&lt;!-- 
_class: title_slide
_paginate: skip
--&gt;

&lt;h2 id="meet-your-instructor"&gt;Meet your instructor&lt;a class="anchor" href="#meet-your-instructor"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;img src="../../static/img/charlotte.jpg" alt="bg right flavour" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Name:&lt;/strong&gt; Charlotte Curtis&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pronouns:&lt;/strong&gt; She/her&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Office:&lt;/strong&gt; B102-4&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Email:&lt;/strong&gt; &lt;a href="mailto:ccurtis@mtroyal.ca"&gt;ccurtis@mtroyal.ca&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Office hours:&lt;/strong&gt; &lt;a href="https://calendar.google.com/calendar/u/0/appointments/AcZssZ1DErlRJ8cNGFn27y-fiFzPEXgDKu8r7LXkGOY="&gt;Book here&lt;/a&gt;&lt;/p&gt;
&lt;h2 id="my-background"&gt;My Background&lt;a class="anchor" href="#my-background"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;img src="../../static/img/charlotte-timeline.svg" alt="center w:900px flavour" /&gt;&lt;/p&gt;
&lt;h2 id="another-new-class"&gt;Another new class!&lt;a class="anchor" href="#another-new-class"&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;blockquote class='book-hint note'&gt;
&lt;p&gt;This course introduces techniques for ethically and responsibly &lt;strong&gt;wrangling&lt;/strong&gt; and manipulating datasets to make them appropriate for addressing the question at hand. Topics may include cleaning and transforming data, integrity and quality measures, common file formats, feature selection and engineering, and generating features from unstructured sources such as text and images.&lt;/p&gt;</description></item></channel></rss>