Homework #1 Part A [Due Feb 26]
[Thanks to James Allen and Jimmy Lin]
Problem 1 (20 points)
Run the following two queries on Google (http://google.com)
and on MSN Live (http://www.live.com/) (do
not run the explanation; just run the query). You will judge 10 documents for
relevance to the query. The explanation will help you decide whether or not
something is relevant.
- gardening wet soil conditions
What special considerations must be made when planting a garden in very wet
soil conditions? Are there plants that will not work and/or that will work
particularly well? Are there any special techniques that will help reduce
the amount of moisture? Only pages that deal with gardening are relevant.
- oil vs. propane furnace
Looking for pages that list the tradeoffs for using a propane furnace rather
than a fuel oil furnace. The pages should provide comparison and are only
relevant if they talk about both. They can talk about other types of heating
(e.g., electric), provided they talk about oil and propane. Propane is also
called LP or LPG (liquid propane [gas]).
When judging pages for relevance, please note the following:
- A page is relevant if it is on the appropriate topic. It is not relevant
only because it has a link to a page that has relevant content.
- Google indents some of the items on its ranked list to indicate that they
come from the same domain (roughly) as the preceding entry. Include those
entries. There should always be 10 entries per page.
- If a page is not found when you follow it, use the cached version. If there
is no cached version, make your judgment from the title and snippet. If you
cannot make a call with that information, mark it as non-relevant.
To decide which 10 documents to judge, use your last name:
- Chiu to Malaviya – do the first page (results 1-10)
- Moyer to Wang – do the second page (results 11-20)
Email your results in the following format to me:
G-or-M query-number doc’s-rank R-or-N
doc’s-title
where G-or-M indicates whether you ran this on Google or MSN, query-number
is 1 or 2 from above, doc’s-rank is the number from 1 to 20 of this document’s
rank, R-or-N is “R” if the page is relevant and “N”
otherwise, and doc’s-title is the title of the document (to help us verify
that results make sense).
As an example, evaluating the first
document retrieved by Google on the query "gardening wet soil conditions"
might result in the following (fictitious) judgment:
G 1 1 N Gardening for dummies
Since you are judging 10 pages from
each of two queries on each of two search engines, the file you email should
have 40 non-blank lines.