Homework 4: HTML indentation.
Due Thursday, May 6.
HTML markup language consists of text interspersed with "tags" which
provide information about how the text is to be displayed. Each tag
is encapsulated between `<' and `>'. For instance, `<p>' indicates the
start of a new paragraph and `</p>' indicates the end of a paragraph.
<ol> starts a numbered list, <li> starts an item of the list, and </ol>
indicates end of the list.
When editing HTML source text containing all these tags, it is very
helpful if the source is indented in a systematic way, very similar to
the indentation we use with C++ programs.
The rules of the indentation we want are very simple. Divide the text
into segments called "tokens". A token is either an html tag or
a portion of text containing no tags and starting from the end of a previous
token to the end of a line or to a tag.
-
Call a section between a start tag and matching end tag a "block".
Indent the block two spaces more than the start tag and end tag themselves.
-
There may be several start tags with the same end tag. Indent them equally.
See example below.
-
After a start tag and until the matching end tag, indent everything more
than the start tag is indented.
-
Indent a start tag just as much as previous text, but more than the previous
start tag. (or, if the previous start tag is the same as this one, use the same
indentation).
-
Indent a text line more than the previous start tag, all text lines with
in the same tagged region being indented the same.
-
Indent an end tag (with the backslash, such as `') as much as the
corresponding previous start tag. If there is no corresponding start tag,
revert to zero indentation.
- As an operational definition of what all this means, your indentation
pattern on any input text should be the same as that produced by
the program `mystery' in the project directory.
For example, non-indented:
<ol> <li> first item <li> second item </ol>
and properly indented:
<ol>
<li>
first item
<li>
second item
</ol>
Incidentally, either would be displayed by a browser like this:
-
first item
-
second item
The project directory is
~saunders/220/html/.
It contains a Makefile, tools.h, test*.in. The Makefile and tools.h are
for your convenience and can be used as you wish. Makefile assumes you
will create an indent.cc file containing the program (and probably #including tools.h).
The test*.in files are little test cases. You will be asked to run
your program on a larger test case near the due date.