Lab 1: Regression with Clean Data#
Due date: January 19, 2026 (next lab session)
Objective#
The main goal of this exercise is for me to learn about your current habits and knowledge. It also serves as an example for why we need “clean” and easy to manage data.
Deliverables#
Click here to join the GitHub classroom and clone the “starter code”. There’s not much in here other than the data sets, so you’ll need to add:
- Your code to load, process, and visualize the data
- A brief report in PDF or Markdown format describing your process and answering a few questions
- A list of references, such as:
- Complete URLs, e.g. to library documentation or Stack Overflow answers
- Search queries, particularly if you used the “AI overview” response
- A complete transcript of AI chat history, if any
Instructions#
In the programming language of your choice, write a program to:
- Load the data in
simple_regression.csv - Calculate the line of best fit using simple linear regression
- Create a scatter plot of the data with the best fit line
📝 Don’t forget to document your process as you do this!
- Load the data in
Copy the plot into your report. With reference to this plot, answer the following questions and justify your answer. You may make and include additional plots or calculations as desired to include in your justification.
- Is linear regression an appropriate model for these data?
- How well does your model fit the data?
- What additional information would help to create a more accurate model?
Repeat question (1), but with the
simple_regression_full.csvdataset. This contains extra columns; select only the same ones that are present insimple_regression.csv. In your report, answer the following:- What additional steps did you need to do to recreate the plot?
- Do you have any guess what this dataset represents?
Submission#
Submit your work by adding new files with git add, commit with git commit, and push your changes to GitHub with git push. You may add/commit/push as many times as you like; only the last revision will be marked. Your repo should include:
- Your source code in the language of your choice with a list of package dependencies to reproduce (e.g.
requirements.txtfor Python) - A PDF or Markdown report, including your scatter plots and responses to the questions
- Your list of references and (if applicable) AI chat histories
Labs are marked pass/fail on the basis of completion.