Critical Cultural Data Explorations

The last few weeks we have read and discussed how power shapes and determines the types of data we collect in ways that are not always visible or beneficial for society. This week, we will delve into this issue further, exploring how power and perspective are embedded in the data we collect and use, as well as in determining what we don’t collect. We will also explore how we can use cultural data to challenge and critique power structures.

In the last assignment, you worked together to identify a focus for your group project and also to identify digitized cultural objects and digital libraries relevant to your focus. This week, you will build on that work by exploring datasets that contain representations of these cultural objects or practices.

In doing so, you will examine the transformation from digital object to dataset, considering how much context, meaning, and nuance is lost or retained in this process. You will critically assess the metadata, transparency, and the biases embedded in the datasets, and finally, you will use AI tools to analyze how datasets reflect or obscure power structures. Through this exploration, you will reflect on the ethics and implications of representing culture as data.

Part 1: From Digital Objects to Datasets: Exploring Cultural As Data Over Time

Your group’s first task is to explore both historic and contemporary datasets related to your digital objects or cultural practices. Depending on your group’s progress from the last assignment, there are two approaches you can take:

1. If you haven’t yet identified a dataset:

Search for relevant datasets that correspond to the cultural objects or practices your group has chosen. These datasets can be drawn from digital libraries, archives, or other publicly available data sources. They might be in the form of spreadsheets, databases, or APIs; part of it is up to you to determine what you mean by “dataset.”
If you are struggling to find immediately relevant datasets, you can also consider relevance broadly and identify an exemplar dataset—something similar that could serve as a model. For example, if you’re studying a group or practice with no dataset, what would be the closest comparable group or practice that has been documented in a dataset?

2. If you already have a dataset:

If you’ve already identified some datasets, then you should focus on exploring specific data points in the datasets that relate to the cultural objects or practices you are planning to focus on.
For example, if you have historic and contemporary datasets, you should see if you can find individual data points (which would be the actual objects or practices) that would likely have gone into creating the dataset. If you cannot find any identical data points, you can again define relevance broadly and look for similar data points from other cultural groups, objects or practices that are comparable.

Questions for Historic vs. Contemporary Datasets Reflection

Your group should aim to find both historic datasets (pre-1980s) and contemporary datasets (post-1980s) to compare (ideally one of both). Even if you can’t find perfect matches, try to locate datasets that address similar types of cultural data, which can still provide valuable insights.

Once you have identified both types of datasets, answer the following questions in your group’s GitHub repository:

1. How do these datasets differ in how they represent cultural objects or practices?

Compare how the digital object itself and the dataset convey the same cultural artifact. What details are maintained, and what are simplified or lost?

2. What kind of metadata or context accompanies the data?

Evaluate the transparency of the dataset about its origins, creation, and curation process. Is there sufficient context provided to fully understand the cultural object?

3. Does the dataset reflect any power structures or biases?

Discuss how the dataset’s structure, curation, or presentation reflects specific power structures. Consider who controls the dataset, what narratives it promotes, and whose perspectives are centered or marginalized.

4. Are there any notable gaps in the data?

Analyze what might be missing from the dataset that was part of the digital object. Why do you think this data is missing? Is it due to technical limitations, a lack of value, or active suppression?

By the end of this part, your group should have documented the differences and similarities between historic and contemporary datasets, focusing on how they represent cultural objects or practices, what is missing, and the power dynamics embedded in them. You are encouraged to make connections to our class readings so far, whether through direct citations or general connections.

Part 2: Perspective & Power: Critically Using AI to Contextualize Cultural Data

In the next part of the assignment, you will build from your assessments to see how AI tools can help analyze the transformation of cultural objects into datasets and assess what is potentially lost in this process. While increasingly AI is being used to facilitate and accelerate this process, we can also use AI to read against the grain, that is using AI tools to understand how dominant perspectives become baked into digital objects and datasets.

Your group is welcome to use any free AI tool, like ChatGPT or others, to complete this assignment. The goal is to analyze how AI can help or hinder our understanding of cultural objects when transforming them into datasets.

AI Cultural Data Analysis Steps

1. Choose an AI tool:

Use an AI chatbot (such as ChatGPT) to help generate insights. You can also explore other AI tools if you prefer, but make sure to document your process and the tool used.

2. Generate descriptions or summaries:

Use the AI to generate descriptions or summaries of both the digital objects and the datasets you identified in Part 1.
Each group member should take responsibility for generating AI responses for at least one dataset and its corresponding digital object.

3. Compare AI outputs:

For each object and dataset pair, ask the AI to describe the cultural object or practice and the data representation. You should try prompts like:

“Describe this cultural object [input the specific object].”
“Summarize the dataset that contains information about this cultural object.”
Collect screenshots of the AI’s responses and compare them.

For more details on prompt engineering, be sure to return to our course resources (though these are primarily intended for GitHub Co-Pilot, many of the same principles apply).

4. Analyze AI-generated responses:

As a group, discuss how the AI’s descriptions vary between the digital object and the dataset. Pay close attention to what details the AI preserves and what it omits.
Reflect on how the AI handles gaps in the data. Does it acknowledge missing information or make assumptions to fill in the gaps? How accurate or helpful are these assumptions?

Questions for AI Reflection

For each AI-generated description or summary, answer the following questions in your group’s GitHub repository:

1. How does the AI describe the cultural object versus the dataset?

Are there differences in how the AI represents the cultural artifact when describing it as a digital object compared to the dataset? What nuances or context is lost in the dataset?

2. What does the AI’s output reveal about the dataset’s embedded power structures?

Does the AI’s language reflect any inherent biases or reinforce specific power dynamics? How does the AI’s interpretation of the dataset compare to the object’s original context?

3. What is potentially missing in the AI-generated descriptions?

Consider whether the AI recognizes or highlights any missing information. Does it attempt to fill in these gaps, and if so, how accurate or reflective are these attempts of the original cultural object or practice?

4. Does the AI challenge or reinforce existing narratives?

Reflect on whether the AI critiques or reinforces dominant narratives about the cultural object or practice. Does it repeat common assumptions, or does it present new insights that challenge the status quo?

Assignment Logistics & Documenting Your Findings

For this assignment, effective group collaboration and task management are essential. In this section, you will assign roles, distribute tasks, and use Git and GitHub to manage your group’s work and document your findings.

1. Assigning a Project Manager

Each group must designate a Project Manager (PM) for the week. The PM is also responsible for helping with the final project, but when it comes to this assignment they are primarily responsible for:

Coordinating the group’s activities: Ensure that all group members are on track with their assigned tasks and that the group stays organized. You should document the plan in the PM Weekly Report Issue Template.
Managing communication: Keep communication clear and focused, whether on Discord or another platform.
Documenting progress: The PM will document the group’s overall progress and ensure that everything is uploaded to GitHub on time (that is prior to the group presentation next week). This includes updating the report before the presentation, as well as updating the group’s GitHub project board.

The role of PM will rotate each week, so everyone will get a chance to take on this leadership role. The PM should check in with the instructors if there are any issues managing group dynamics or completing tasks. As a reminder, the PM is not evaluated on how well everyone completes their tasks but rather how well they manage to the best of their abilities.

If no one volunteers, the Instructors will select a PM randomly. You are also encouraged to develop a schedule so that everyone knows when they will be serving as PM.

2. Assigning Tasks

After the Project Manager has been assigned, divide the work for this assignment. Each group member should have clear responsibilities for completing the following tasks, though collaboration is encouraged and these are primarily suggestions of how to divide the work:

Dataset identification and comparison: One or more members should focus on finding relevant datasets (historic and contemporary) or data points from previously identified datasets.
AI analysis: One or more members should be responsible for generating AI outputs for at least one digital object and dataset pair and analyzing the results.
GitHub documentation: One or more members can take the lead on ensuring that all answers, reflections, and screenshots are documented properly in the group’s GitHub repository, though every member should write up and commit their sections of the reflections.

Work together to ensure that all the questions in Part 1 and Part 2 are answered thoroughly, and that AI-generated responses are compared and reflected upon.

3. Using Git & GitHub for Sharing Materials & Task Management

To effectively manage your work and track progress, your group will use Git and GitHub. Here’s some additional suggestions for your workflow:

Create Issues: For each task, create a blank issue on your group’s GitHub repository. This will help track progress and ensure that tasks are clearly assigned to various group members.
Use Your Group Project: Add your issues to your GitHub project board to organize tasks, such as dataset collection, AI analysis, and final documentation. This allows everyone to see what stage the assignment is in and who is responsible for each part.
Finalize File & Folder Organization: You should agree on a clear structure for your GitHub repository. For example, you might have a folder for group assignments, as well as your final project. For this assignment, you might also create subfolders for datasets, images, and reflections, where you share the actual datasets, images and screenshots, as well as your written reflections. Remember that part of your grade will be based on how well you organize and document your repository, and if you are unsure how to best achieve this, please take a look at our GitHub Style Guide

By managing your project effectively with GitHub, your group will ensure smooth collaboration and documentation of all aspects of the assignment. The Project Manager will help facilitate this process, but everyone is expected to contribute equally.

While all groups are required to submit their findings, prior to class next week, the three groups who did not present last week will be required to present their findings to the class. The presentation should be approximately 10 minutes long, with time for questions afterward.