What's in a Web Page?

If you've ever wondered how Web pages work, here's your answer.As an electronic musician, you may consider the Web to be part of your tool kit. Perhaps
Author:
Publish date:
Image placeholder title

If you've ever wondered how Web pages work, here's your answer.

Image placeholder title

As an electronic musician, you may consider the Web to be part of your tool kit. Perhaps you have surfed the Internet countless times, but what exactly is a Web page, anyway? Where did the technology come from, how does it work, and where is it going?

What became the World Wide Web was originally an experiment to find ways for researchers to communicate and share data remotely. It worked. Being creative types who didn't know when to stop, those researchers decided to keep development going and make their creation more interesting than the mostly text-based prototype.

ENTER HTML
The early Web's plain text pages were hardly sexy or feature-laden. A method was needed to control the appearance and layout of a page - font sizes, italics, boldface, headings, paragraphs, and so on - while retaining as much of a text file's simplicity and compactness as possible. The developers decided to utilize a set of embedded, text-based commands called markup to control these properties. They created a powerful and flexible standard so that one programmer's software could understand another's markup. That standard was dubbed Standard Generalized Markup Language (SGML).

However, SGML had more power and flexibility than was needed for the Web, and it was difficult to use. So a relatively small portion of SGML's potential was tapped to add simple markup to text, and the result was HyperText Markup Language (HTML). Early versions of HTML offered little more than rudimentary text formatting. But one really powerful feature of SGML was retained in HTML: hyperlinking, the ability to jump from one page to another with just a click. Hyperlinking and ease of use made HTML very popular very fast.

Of course, the coolest markup in the world is useless if you can't display it. Some of the first software for viewing HTML pages came from - you guessed it - researchers. Browsing HTML-based Web pages with the new software quickly became the norm, so browser came to mean the software used to do that. Browser development was soon big business, and features such as font colors and the abilities to display images and to play sounds were added along with new markup to support them.

Unfortunately, features developed faster than standards, and manufacturers frequently disagreed on implementation. Markup that worked well with one browser crashed another. That remains a problem, a trade-off for the rapid development of browser software, which is now so advanced that it resembles a multimedia presentation environment. On the positive side, the most popular browsers, Microsoft's Internet Explorer and Netscape's Navigator, are distributed for free.

HTML STREET SMARTS
Basic HTML is not rocket science. Its markup consists of simple tags, which are really just abbreviations placed between "<>" signs and usually used in pairs. The simplest possible HTML document is:

Hello

That's it. The tag identifies the start of the HTML document, the text "Hello" is the body of the document, and the tag identifies the end of the document. A minimally useful HTML document would probably have at least the following tags:

Welcome

Welcome to my no-frills test page. Coming next week: War and Peace.

The tags let you delineate information that you don't want to appear in the document body itself - in this case the page title that will display in the browser's title bar. Including a tag provides additional options for control over the page presentation. Notice that I've added some instructions - attributes in techspeak - inside the tag that tell the browser to display a white background with black text. The

tags display the text "Welcome" as a heading, typically in boldface. (There are six heading sizes; H1 is the largest and H6 the smallest.) The

tags delineate a paragraph, and the tags cause the title War and Peace to be displayed in italics.

You can create any HTML document in a text editor such as Notepad or SimpleText. Simply save the file as ASCII text with a file name that ends with a .htm or .html extension, then use the browser's File Open command to view your file. HTML tags are not case sensitive, and you can use tabs, blank lines, and extra spaces to make your markup easier to proof and edit. Just make sure the tags have correct syntax and are always nested, not overlapping. (Overlapping tags are considered an error in HTML and might not be interpreted correctly by some browsers.) For example, use

text

rather than

text

.

Image placeholder title

FIG. 1: MP3.com''s Featured Alternative page is a content-rich site with several different types of data. It is shown here as displayed by Netscape''s Navigator 4.

INSIDE THE BROWSER
At this point, you have an idea of what basic HTML is and how a browser displays it. But a typical commercial page's markup is much more complex than that of my No-Frills Test Page, and current browsers are pretty complex specimens of software. (See the sidebar "You Can Get There from Here" to learn how a browser finds the pages you request.) For example, Fig. 1 shows the MP3.com Featured Alternative page (the top part of it, anyway), and Fig. 2 shows the first 70 lines of the approximately 1,200 markup lines used. (You can reveal any page's HTML using the browser's View Source function, which is usually found under the View menu.)

As you can see, in addition to the straightforward tags described earlier, there are many others in use, and a Web browser must understand how to display them or at least how to ignore them without problems. Consider something as fundamental as a tag to place an image on a page:

Image placeholder title

Say this is the first image on the page. The browser attempts to display as much of the HTML as it can before it loads and displays the image. Then it tries to find the image. In the example, the browser searches for the image (logo.gif) in the directory (folder) where it found the HTML file. It checks the file extension and other information to determine the file type. Then the browser decompresses the file (all common Web image formats - GIF, JPEG, and PNG - are compressed to save download time) and determines the color depth and size in pixels. If the color depth is greater than what the computer's video card can handle, it must determine whether to dither or substitute the missing colors.

Image placeholder title

FIG. 2: The markup for the MP3.com page shown in Fig. 1 is extremely detailed; here are just the first 70 lines of the approximately 1,200 lines used to create the page.

If height and width attributes are provided, the Web browser compares them with the actual image size to determine whether it needs to internally resize the image before it's displayed. If those attributes are not provided, the browser attempts to display the image at full size, and it may have to lay out and display the rest of the page again in the process. (That's the reason some pages redraw or stutter while loading - the author neglected to specify the image size attributes.) If the file is damaged or in an unrecognized format, the browser must determine whether to display an error icon, and it may have to lay out and display the page again around that. That's just for one static image.

If the image is a hyperlink, the browser must determine whether to put a border around the image and what color and size the border should be. If the image is the basis for an image map, the browser must also keep track of the mouse position over the image and the link to which each image area points. But those are just the minimum requirements. A Web author may add additional attributes to control layout more precisely or run small programs called scripts inside the browser. For example, a script may change the image when the mouse moves over it or display a text message in the browser's status bar.

In addition, the browser must remember not only the image's functions and position, but also the function and position of every element on the page, even while you are clicking on, scrolling, and resizing the browser window. When you consider that a page may encompass not just images and text but forms, audio and MIDI files, and even embedded programs in Web-friendly languages such as Java and JavaScript, the functionality of modern browsers is pretty impressive.

MY BROWSER NEEDS HELP
What if you want to play an MP3 or RealAudio file or view a PDF document or Flash animation, and your browser doesn't understand that format? You'll probably be confronted with a pop-up message that asks if you want to "Pick an Application," "Save File to Disk," or something equally unhelpful. You could upgrade to a newer browser that might support the file type in question, but if you have an older computer or operating system, that may be impractical if not impossible. In that case, you'll have to deal with helper apps (short for applications) and plug-ins.

Configuring your browser to use a helper app really just means telling the browser that when it encounters a certain file type, it should open another program to play or view that file. For example, when you click on a hyperlink to a RealAudio file, RealPlayer should pop up and play the file.

Configuring the browser to use a plug-in usually requires downloading that plug-in and going through an installation process. Then, when the browser encounters the relevant file type(s), the plug-in runs inside the browser, often transparently. A Shockwave Flash animation running as part of a Web page serves as a good example. Unfortunately, configuring helper apps and plug-ins can be a notoriously difficult process, even in fairly recent browsers.

WEB OF THE FUTURE
Web development's current focus is on convergence and standardization. It's not unreasonable to expect that the very next generation of browsers will offer unprecedented support for a broad range of tags, media types, and programming functions - just in time for HTML to become obsolete. Extensible Markup Language (XML) and XHTML, an XML/HTML stopgap, are already being deployed to solve some of the problems of separating Web content from appearance. (For more on XML, see "Web Page" in this issue of EM and "Tech Page: XML Marks the Spot" in the November 1999 issue of EM.)

Do you need to understand these developments right now? Probably not, but the new markup languages will be part of the electronic musician's world in the near future.

So you painstakingly typed in every character of the mile-long Web address and . . . eureka! You actually found a page; maybe even the one you wanted. Web domains are out there just waiting to send you content. But how does your browser know where to find www.emusician.com?

Computers are good at numbers, so it makes sense for the location of each computer on the World Wide Web (and there are millions of them) to be referenced as a series of numbers, not unlike a postal ZIP code. Those numbers, separated by periods, constitute a computer's Internet Provider (IP) address. For example, www.emusician.com's IP address is 208.242.199.55; if you type that series of numbers and periods in your browser's Location field and hit Return, you'll get EM's home page.

Of course, for humans, remembering www.emusician.com is a lot easier than remembering a string of numbers. Fortunately, there's a hierarchy of computers connected to the Web that do nothing but look up domain names and tell browsers and other Web software where to find them. This hierarchy is called the Domain Name System (DNS). At the top of the DNS are computers controlled by the federal government that tell other computers which one holds the official IP address record for the domain name in question.

When you dial up your provider and then use your browser to request www.emusician.com, the provider checks to see if that address is in its DNS computer. If not, the provider's computer queries others to see if a nearby DNS computer has the information. The provider's computer continues in this manner all the way to the top, querying the computer that holds the official address record. The beauty of this system is that the DNS computers along the way - not just your IP's computer - receive and store the IP address information for some time, which makes subsequent lookups much faster.