More Data, More Better

10 Nov

The Google Developer Day 2007 talk with Peter Norvig raised some interesting points.  I think there was some very useful information in there, but there was an issue early on with the use of jargon.  “Asymptote off” means that as the lines get further away from the starting point they get closer together.  Translation for the data: as we go further down the horizontal axis, the starting point matters less and less.  And, quick, who got the “more, cat” joke?  Me neither.  (All translations and explanations are thanks to DH, who laughed out loud at “more, cat”.  It’s a joke about UNIX.  More and cat are commands in UNIX, so the results were other UNIX commands.)  Toward the end of his talk, Norvig put up a slide showing the “Big Picture” – the goal of a happy user.  However, speaking with DH, who works for another California-based Internet search company that we’ll call Hooray!, explained the first two rules of an Internet company.  Rule #1: it’s all about making money, not the user.  Rule #2: ALWAYS say it’s about the user.   This, of course, made Norvig’s answers to the questions about user satisfaction far more amusing.

Now on to what I found useful for my purposes.  Irritation with jargon and techno-babble aside, the initial description of use of data was important and bears repeating:  the more data, the better.  I think the point that as the amount of data goes up, and the more effectively it is used, the better information we’ll get, is quite useful in a historical setting.   This touches on Leary’s points about “remember the grotto”.  Leary also points out that if he had only entered “grotto” he would have gotten too many results, and if he had entered “please to remember the grotto” he probably would not have gotten any results.  This relates to what Novig was talking about with the “learning” machine.

Taking these two together, I have decided that more data is not always better.  There is such a thing as too much information and information overload.  This has implications for my own site, because I want to be able to entice users to look deeper without scaring them with an avalanche of (to them) random information about metallurgy or postal regulations of 1916.  I believe that in order to get both the academic rigor and the user-friendliness I want, I will have to put in a few more layers of intervening pages between the initial collections page and the full data on the object.  I will have an initial welcome page, explaining the purpose of my portion of the site.  The user will then be able to click on a section of the collection, sorted by time period (Mexican War, World War I), by type (headgear, weaponry), or topic (African-Americans, domestic deployments).  Most of our collection falls under multiple headings so items will have to be tagged for all groups.  For example, a helmet worn by a member of the 93rd Division during World War I would fall under “World War I”, “Headgear”, and “African-Americans”.  The user selects the section of collection they’re interested in, say World War I.  Now, here’s where I have some conflict – do I sort out the collection further (uniforms, headgear, weaponry) or do I have it all appear?  My instinct is to divide it further, but I have been on sites like that and they can get frustrating very quickly.  If I have all items tagged as “World War I” appear, how do I sort them?  The most logical way on the museum side is to sort them according to accession number, but since items are acquired piecemeal, the resulting display will make no sense – a rifle, next to a homefront window banner, next to Army-issued long underwear.  To my mind, displaying them any other way requires more selections from the user.  Thoughts?

(And when Novig was talking about learning machines, did anyone else start hearing the “Terminator” music in their head?)


Posted by on November 10, 2009 in Project, Readings


5 responses to “More Data, More Better

  1. Mark Stoneman

    November 10, 2009 at 12:50 pm

    I’m coming at this from the outside, with no knowledge of what is going on in your class. That caveat aside, I should think you need as many tags as are necessary to produce meaningful search results. When blogging I try (and frequently fail) to reduce the number of categories, but I’m not sure I see the downside of more tags for what you are talking about, unless you are creating menus with these tags as opposed to making the tags searchable.

  2. lprice3

    November 10, 2009 at 1:17 pm

    “Now, here’s where I have some conflict – do I sort out the collection further (uniforms, headgear, weaponry) or do I have it all appear? My instinct is to divide it further, but I have been on sites like that and they can get frustrating very quickly. If I have all items tagged as “World War I” appear, how do I sort them?”

    Can you do both? For example, I search for WWI and I get a long list (or thumbnails) of all of the great items. My reaction may be – great! This will give me a lot to explore! But, let’s say I’m looking only for weaponry. This long list may annoy me because I am endlessly impatient (for this example only). But then I see that on the side of the screen, there is an itemized list of categories WITHIN the WWI category. I am happy.

    So that’s my 1 1/2 cents.

  3. colamaria

    November 10, 2009 at 1:32 pm

    Can’t you sort using the tag hierarchy? For example, the helmet would be:

    Period: WWI
    Category: Uniforms
    Name: Helmet

    That way, the user would see items on the page that are all WWI-uniform-helmet grouped together. If there is only 1 helmet, then the user would see WWI-uniform items together. If you only have 1 uniform item, then all the WWI items would be grouped together.

    (did that even make sense?)

  4. theoldscholar

    November 10, 2009 at 2:09 pm

    Mark brought up one aspect of the need for identifying data – searching. Dave brought up another reason for putting identification on data – grouping things together. In reality we have two different properties of data; classes that things belong to and attributes that describe what a thing has. Biology has the best known example of a well defined set of classes – the taxonomy that puts us into a hierarchy of Kingdom, Phylum, Class, Order, Family, Genus, Species. Biologists work within this framework of order, to put everything in it’s rightful place. However, each thing not only belongs to a class but it has attributes that describe it. For instance, an attribute can be height. You could get all Animals that are greater than 5 ft high or you could get all homo sapiens that are 5 ft high. I think Zayna has developed an initial classes she wants to present to the user, Time Period, like WWI. She may want to set up a hierarchy of things like, Weapons, Clothing, Printed Materials. Once she allows people to browse those classes she can also allow people to use tags to search within and between classes, e.g. all weapons that use gunpowder, all weapons with effective range greater than 10 yards, etc. whatever attributes she wants to collect for each item. Most people would be interested in something that is grouped together but some visitors may be interested in a certain attribute. For instance someone may be interested in putting together a display and they only have room for something that is smaller than 2 ft long. It wouldn’t make sense to have a display room of things 2 feet long but it would make sense to be able to query along that attribute.

    Just as everyone develops their own way of filing information, your site can develop it’s own way – but if people can’t decipher your logic they will say it is not user friendly. Picking your classes and your attributes is arbitrary but needs to be based on things that make sense to your user – not to you as a historian. Data tagged and classed wrongly doesn’t get used and falls into disrepair.

    I hope my rambling makes some sense, if not go ahead and ask me to explain some more – or catch me in class.

  5. colamaria

    November 19, 2009 at 6:07 pm

    there’s a tool on this site called “zoomify” that might be handy for your object visualization. it isn’t 3D, but I wonder if maybe they have something similar for 3D? this is running off Omeka so you could use that to build your object database, which would be handy. I hadn’t seen it before this site, thought you might like it.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: