Introduction to Markup and Styling Web Documents

From Markdown to the language of the web

From Markdown to HTML

Last week we learned about Markdown. This week we’re going to take a step further and introduce you to HTML, which is the language of the web.

Introducing Markup Languages

What is a Markup Language?

From Wikipedia:

“a markup language is a system for annotating a document in a way that is syntactically distinguishable from the text, meaning when the document is processed for display, the markup language is not shown, and is only used to format the text.”

TEI: Text Encoding Initiative

Another example of a Markup language that has a long history in Computing in the Humanities is TEI, which stands for Text Encoding Initiative—a markup language used to encode texts in a way that makes them machine-readable. You can read more about TEI in “What Is TEI?” Text Encoding Initiative, 2022. https://tei-c.org/what-is-tei/.

Another important markup language is TEI. It has a long history in Computing in the Humanities. TEI is used to encode texts in a way that makes them machine-readable. You can read more about it at https://tei-c.org/what-is-tei/.

Today we are used to having OCR and other tools that can help us digitize texts, and then programming languages like Python and R that let us manipulate text into data (a topic we’ll cover later in the semester). But in the early days of the web, this was not the case, which made it difficult for machines to know what was in a text. This is where TEI comes in, as it allows us to encode texts in a way that makes them useable to machines. For example, TEI often involves marking what a date is, or what a person’s name is, or what a place is. This allows us to search and analyze texts in ways that would be impossible without TEI.

TEI and HTML both were created in the same historical moment, the early days of the web. HTML was created by Tim Berners-Lee in 1991, while TEI was created in 1987.

TEI Example: The Proceedings of the Old Bailey

The Proceedings of the Old Bailey

The Old Bailey Online project (started in 1999) is one of the older Digital Humanities projects that’s still in operation. They’ve been digitizing the records of the Old Bailey, the central criminal court of England and Wales. They use TEI/XML to encode the text of the records, which allows them to be searched and analyzed in sophisticated ways.

What’s unique about the Old Bailey is that they manually marked up all the text using a “double rekeying” strategy—having each text transcribed twice and then automatically compared to identify errors. They did this because OCR software in the 1990s wasn’t good enough for eighteenth-century print, especially from microfilm.

This incredibly laborious process resulted in 99.9% accurate transcriptions. The authors note that “XML markup would not have worked with the kind of error-ridden text produced by OCR methodologies, so applying complex markup was dependent on adopting rekeying.”

Not only did this facilitate accurate searching, but it meant that the text could be reused in subsequent digital projects. This example shows us that rigorous markup and encoding requires significant human effort, but that effort creates very high-quality data.

What is HTML?

HyperText Markup Language

HTML is not a programming language—it’s a markup language used to:

Tell your browser how to structure web pages
Create documents that are rendered in your browser
Make content appear or act a certain way

While TEI is still in use today, by far the most popular of all markup languages is HTML, which stands for HyperText Markup Language. HTML is the language of the web and is used to create web pages. This might be surprising to learn but when you go to a website, it is just showing you a document (just like our .txt or .md files). The difference is that this document is written and marked-up in HTML and is rendered by your browser.

This is an important distinction. HTML is purely about structure and presentation. It doesn’t perform logic or calculations. Unlike Markdown, which uses simple symbols, HTML uses more formal “tags” that describe the semantic meaning and structure of content.

So while in Google Docs you might use the GUI to format text, or in Markdown we use # symbols, in HTML we use a series of tags. Tags have a name, a series of key/value pairs called attributes, and some textual content.

When you visit a website, you’re viewing an HTML document rendered by your browser.

Creating an HTML File

Let’s try an example! To create an HTML file, just use the .html extension:

touch first_page.html

Then add some content in my IDE, like we did with our Markdown files.

My first page!

Then save it and open it in your browser!

Adding HTML Tags

Now try altering your file to include HTML tags:

<p>My first page!</p>

Save it and open the file in the browser again. What do you see?

Notice anything different?

Probably not!

Inspecting the Webpage

To see the HTML tags, use Developer Tools:

Right-click on your webpage
Select “Inspect” or “Inspect Element”

Inspect Page

This is one of the most important skills for web development. When you right-click and select “Inspect,” your browser opens a panel showing you the actual HTML source code of the page you’re viewing.

This works on any website—not just your own pages. You can inspect any website to see how it was built, what HTML structure it uses, what CSS is applied, etc. This is how you learn web development.

Different browsers have slightly different names: - Chrome/Edge: “Inspect” or “Inspect Element” - Firefox: “Inspect Element” - Safari: “Inspect Element”

You can read more about Chrome Developer Tools here: https://developers.google.com/web/tools/chrome-devtools/console/ And Firefox Developer Tools here: https://developer.mozilla.org/en-US/docs/Tools/Page_Inspector/How_to/Open_the_Inspector

The Source Code

What you see in the inspector is called the source code—the actual HTML that creates the page:

Your <p> tags are there!
The browser was interpreting them the whole time
They just don’t display on the page itself

Anatomy of an HTML Tag

<p>My first page!</p>

Opening tag: <p>
Content: “My first page!”
Closing tag: </p>
Element: all three together

Common HTML Tags

How would we make it into an HTML heading? Let’s take a look at some of the more common HTML tags that we can use to create HTML elements https://www.w3schools.com/tags/ref_byfunc.asp

Common HTML Tags

Tag	Purpose
`<h1>` to `<h6>`	Headings (h1 is largest)
`<p>`	Paragraph
`<div>`	Container/division
`<a>`	Link (anchor)
`<ul>`	Unordered list
`<li>`	List item

HTML Attributes

Great now what if we wanted to add a link so that you could click on that heading and go to another page (say the iSchool home page https://ischool.illinois.edu/)?

Well then we need to add an attribute.

HTML Attributes

HTML elements can have attributes—extra information about the element.

This diagram is also from the Mozilla docs and you can read more about how HTML elements can also have attributes here.

HTML Attributes

Let’s try using the anchor tag and href attribute to create an HTML element that links to https://ischool.illinois.edu/

You can find a list of HTML attributes here https://www.w3schools.com/tags/ref_attributes.asp

<a href="https://ischool.illinois.edu/">iSchool</a>

How does this new tag change our html page?

HTML Attributes

Here’s another example that we should add to our html page, using the HTML <div> tag:

<div class="header" style="background: blue;">About Me</div>

In our example, the href attribute tells a link where to navigate to when clicked. The class attribute is another common one that helps you identify elements for styling with CSS.

Nesting Tags

Tags can contain other tags in a hierarchical structure:

<ul>
  <li>Likes Coding and History</li>
  <li>Likes "What We Do in the Shadows" TV show</li>
  <li>Dislikes Mint Chocolate</li>
</ul>

<ul> is the parent
<li> elements are children
All <li> elements are siblings

HTML’s Limitations

HTML is a very powerful language and there are many more tags that we can use to create HTML elements. You can find a list of all the HTML tags here https://www.w3schools.com/tags/ref_byfunc.asp.

HTML’s Limitations

But HTML also has some limitations. Take a look at this helpful overview of HTML’s shortcomings by Alison Parrish (bold added for emphasis)

HTML’s shortcomings by Alison Parrish

HTML documents are intended to add markup to text to add information that allows browsers to display the text in different ways—e.g., HTML markup might tell the browser to make the font of the text a particular size, or to position it in a particular place on the screen.

HTML’s shortcomings by Alison Parrish

Because the primary purpose of HTML is to change the appearance of text, HTML markup usually does not tell us anything useful about what the text means, or what kind of data it contains. When you look at a web page in the browser, it might appear to contain a list of newspaper articles, or a table with birth rates, or a series of names with associated biographies, or whatever. But that’s information that we get, as humans, from reading the page. There’s (usually) no easy way to extract this information with a computer program.

HTML is Forgiving (But Messy)

HTML is also notoriously messy—web browsers are very forgiving of syntax errors and other irregularities in HTML (like mismatched or unclosed tags). For this reason, we need special libraries to parse HTML into data structures that our Python programs can use, libraries that can make a “good guess” about what the structure of an HTML document is, even when that structure is written incorrectly or inconsistently.

Learn More About HTML

Understanding these limitations is important as we start to work with HTML and other web technologies. For more detailed information, I recommend reading through this introduction from Mozilla on HTML.

Web Styling and Interaction

A Real Example: whatisdigitalhumanities.com

What do we see in the inspector?

Inspecting HTML in Your Browser

What is Digital Humanities Inspected

Basic HTML Document Structure

Every HTML page should have this basic structure:

<!DOCTYPE html>
<html>
  <head>
    <!-- Metadata about the page -->
  </head>
  <body>
    <!-- The actual content -->
  </body>
</html>

Selecting and Editing Elements

Selecting elements in the inspector

Selecting and Editing Elements

You can even modify CSS and HTML right in the inspector to experiment:

Editing in the inspector

Styling Websites

In this example, I altered two part of the HTML document:

The span element with the class title:

<span class="title">When Is Digital Humanities?</span>

And then the styles that are applied to that class:

.title {
    font-family: Changa, var(--sans-font);
    background: #faf;
    color: #fff;
    padding: 3px;
}

CSS: Cascading Style Sheets

CSS Structure:

selector {
    property: value;
    property: value;
}

Selector (.title) - which elements to style
Properties and values - how to style them

CSS is what makes websites beautiful. The term “cascading” refers to how styles cascade down through the document hierarchy—parent styles apply to children unless overridden. Much like HTML, CSS has a defined structure that is comprises a selector (in our case the .title) and a declaration block (the curly brackets), where you write your style rules. The selector specifies which HTML elements the rules apply to, and the declaration block contains one or more declarations separated by semicolons. Each declaration includes a CSS property name and a value, separated by a colon.

In our example, the selector .title targets any element with the class “title”. The declaration block (inside the curly braces) contains the style rules. Each rule has a property (like font-family or background) and a value (like Garamond or #faf).

This separation of content (HTML) from presentation (CSS) is crucial. It means you can completely change how a page looks by just changing the CSS, without touching the HTML.

The Source Code Behind the Styling

To get a better sense of what this code looks like, we can look directly at the index.html file directly in the GitHub repository https://github.com/hepplerj/whatisdigitalhumanities.

What is Digital Humanities

The Source Code Behind the Styling

If we search for the <style> tags, we can see it is located between lines 42 and 58 that it contains the following code:

CSS in the actual website code

To learn more about this particular code, read the callout Deep Dive Into CSS on the course website.

When you look at the actual HTML/CSS source code of a website like whatisdigitalhumanities, you see complex CSS that includes: - Custom fonts (@font-face) - CSS variables (:root) - Dark mode support ([data-theme=“dark”]) - Different selectors for different elements - Media queries for responsive design - Detailed styling rules for every aspect

Overall, CSS is used to enhance the user experience by providing a visually appealing and functional interface. It allows web developers to:

Apply custom styles to their web pages, making them look unique.
Define responsive designs that adapt to different screen sizes and devices.
Implement theme toggling (like dark mode) to enhance accessibility and user preference.
Ensure that the presentation of the content is consistent across different browsers and platforms.

So here we can see that the .title code we altered is actually part of a larger set of code that is used to style the entire website. This example hopefully shows how powerful and complex web development can be.

Learn More about CSS

A great resource for learning more about CSS is the Mozilla docs https://developer.mozilla.org/en-US/docs/Learn_web_development/Core/Styling_basics/What_is_CSS.

Example of Advanced CSS

JavaScript in WhatIsDH

The main other tag that we should pay attention to is the script element is the other way that interactivity happens on most websites. If we search for the <script> tags, we can see it is located between lines 80 and 109 that it contains the following code:

JavaScript code in whatisdigitalhumanities

Learn more in the callout on the Deep Dive Into JavaScript on our course website.

This code is in a language called JavaScript. This code is using a JavaScript library called jQuery, which is a library that makes it easier to write JavaScript. Unlike HTML or CSS, JavaScript is a programming language (the distinction is not crucial to know but can be helpful when learning about different Digital Humanities methods). At this point, most of the web is powered by JavaScript, so it is incredible powerful and ubiquitous.

When you look at the actual JavaScript in whatisdigitalhumanities, you see it using D3.js (a powerful data visualization library) and Lodash (a utility library) to: 1. Load quote data from a CSV file 2. Select a random quote from that data 3. Display it on the page using the DOM 4. Listen for click events on the “update quote” button 5. Change the quote when the user clicks

If you want to learn more about what exactly this code is doing, toggle the following section. Full disclosure some of this language and concepts are fairly advanced, so feel free to use Co-Pilot or ask the Instructors for help.

JavaScript: Adding Interactivity

JavaScript is a programming language that adds behavior to websites:

document.getElementById("button").addEventListener("click", () => {
  // Do something when button is clicked
});

JavaScript can:

Respond to user actions (clicks, scrolling, typing)
Update page content dynamically
Fetch data from servers
Create animations and interactive features

The DOM: Document Object Model

JavaScript sees a webpage as a tree structure:

html
├── head
│   └── title
└── body
    ├── h1
    ├── p
    └── ul
        ├── li
        └── li

The Three Pillars of the Web

Technology	Purpose
HTML	Structure - what things are
CSS	Presentation - how things look
JavaScript	Behavior - what things do

HTML alone creates a functional webpage, but it’s not very beautiful or interactive. To create modern websites, we need three technologies working together.

HTML provides the semantic structure and content. CSS controls how everything looks. JavaScript makes things interactive.

The combination of HTML, CSS, and JavaScript is what makes the modern web possible. Yes, it can be complex, but this complexity is necessary to create the rich, interactive experiences we expect from websites today.

For example, think about Gmail or Google Maps. These are incredibly interactive applications running in your browser. They’re possible because of the combination of these three technologies.

Also note that these technologies can make websites slow to load (large JavaScript files can take a long time) and difficult to maintain. This is an ongoing challenge for web developers.

Looking at projects like whatisdigitalhumanities, you can even see the effort required by exploring the commit history on GitHub—showing many contributions over time as people continuously update, fix, and improve the site.

The Effort Behind Web Projects

GitHub commit history for whatisdigitalhumanities

Homework: Source and Style

✨ Time for you to create some HTML ✨