UIUC MCS – CS 513 Review – Theory and Practice of Data Cleaning

uiuc-mcs-–-cs-513-review-–-theory-and-practice-of-data-cleaning

Overview

  • TLDR: 513 won’t teach you very much, and what you will learn is highly outdated, but it’s an easy 500-level course.

  • Difficulty: Very easy

  • Opinion: Disliked

  • Weekly workload: 2 hours

  • Semester: Summer 2023

Class Content

Lecture Content

Every week consisted of about an hour of lectures. The topics covered included data validation, profiling, relational models, Datalog, SQL, Workflows, Provenance, and YesWorkflow. I could not figure out exactly when these lectures were recorded, but I’m guessing they are close to a decade old. Ideally, they should all be rerecorded at this point and factor in newer material.

I’m not sure why the course title includes ‘theory’ as the lectures focused entirely on data-cleaning practices. But each week had links to data cleaning papers, and those contained good resources. A diligent student who consumes all those external papers could learn a lot and cover a lot of theory ground. The course doesn’t have any mechanism for enforcing reading.

Assignments

As with most MCS courses, there were weekly quizzes. The quizzes allowed for unlimited attempts and never took more than a few minutes to complete.

There were six homework assignments. In order they were Regular Expressions, OpenRefine, Datalog, SQL, Provenance, and Python. None of these assignments took more than two to three hours to complete. They all were basic implementation and programming assignments with autograders.

The class did not have any exams. Instead, it concluded with a two-phase group project. Groups consisted of three people. The setup of the project did not require much collaboration, and my team corresponded entirely over Teams messages without any synchronous meetings.

The project required cleaning some given datasets. Then you had to write a paper analyzing essentially how dirty the dataset was before and how much you were able to clean or improve it through your process. You also had to submit documentation about your cleaning process and write up some potential benefits of the cleaning. There was not any difficulty with the project.

My Takeaways

This class is ridiculously easy. It does not feel adequate at the graduate level and certainly should not be a 500-level course. I can see how many would be disappointed by the lack of rigor in what is an otherwise challenging program. If you are paying by the credit hour, it makes sense that you would want a considerable knowledge return on investment. I simply don’t think this class offers that.

I think the biggest disappointment is data cleaning is a crucial skill for all data science or software engineer jobs. The content is so important that the class deserves to be good! If the content was updated and some of the assignments swapped this class could be something special. Unfortunately, the execution is not there right now.

All that being said, there are not many 500-level options, so you will probably need to take this class. Additionally, the low difficulty did make for a very well-balanced semester when paired with CS 416. I would recommend pairing this class with something else, and you’ll still have a decently challenging semester.

The banner was generated using the UIUC LinkedIn Banner Generator. It is an awesome tool if you need an Illinois-themed banner for anything.

More Reviews

Check out uiucmcs.org for more reviews of MCS courses. I don’t know who maintains this site, but it’s a good review collection from many semesters.

I have also written up a CS 427 review, a CS 435 review, a CS 498 Cloud Computing review, and a CS 416 review.

Originally published at https://blog.seancoughlin.me.

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post
mariadb-109-on-openbsd-7.3:-インストール

MariaDB 10.9 on OpenBSD 7.3: インストール

Next Post
integrating-analytics-in-a-figma-plugin-–-quick-guide

Integrating Analytics in a Figma Plugin – Quick Guide

Related Posts
arkui-x平台差异化

ArkUI-X平台差异化

跨平台使用场景是一套ArkTS代码运行在多个终端设备上,如Android、iOS、OpenHarmony(含基于OpenHarmony发行的商业版,如HarmonyOS Next)。当不同平台业务逻辑不同,或使用了不支持跨平台的API,就需要根据平台不同进行一定代码差异化适配。当前仅支持在代码运行态进行差异化,接下来详细介绍场景及如何差异化适配。 使用场景 平台差异化适用于以下两种典型场景: 1.自身业务逻辑不同平台本来就有差异; 2.在OpenHarmony上调用了不支持跨平台的API,这就需要在OpenHarmony上仍然调用对应API,其他平台通过Bridge桥接机制进行差异化处理; 判断平台类型 可以通过let osName: string = deviceInfo.osFullName;获取对应OS名字,该接口已支持跨平台,不同平台上其返回值如下: OpenHarmony上,osName等于OpenHarmony-XXX Android上,osName等于Android XXX iOS上,osName等于iOS XXX 示例如下:…
Read More