Off the Top: Searching Entries
As If Had Read
The idea of a tag "As If Had Read" started as a riff off of riffs with David Weinberger at Reboot 2008 regarding the "to read" tag that is prevalent in many social bookmarking sites. But, the "as if had read" is not as tongue-in-cheek at the moment, but is a moment of ah ha!
I have been using DevonThink on my Mac for 5 or more years. It is a document, note, web page, and general content catch all that is easily searched. But, it also pulls out relevance to other items that it sees as relevant. The connections it makes are often quite impressive.
My Info Churning Patterns
I have promised for quite a few years that I would write-up how I work through my inbound content. This process changes a lot, but it is back to a settled state again (mostly). Going back 10 years or more I would go through my links page and check all of the links on it (it was 75 to 100 links at that point) to see if there was something new or of interest.
But, that changed to using a feedreader (I used and am back to using Net News Wire on Mac as it has the features I love and it is fast and I can skim 4x to 5x the content I can in Google Reader (interface and design matters)) to pull in 400 or more RSS feeds that I would triage. I would skim the new (bold) titles and skim the content in the reader, if it was of potential interest I open the link into a browser tab in the background and just churn through the skimming of the 1,000 to 1,400 new items each night. Then I would open the browser to read the tabs. At this stage I actually read the content and if part way through it I don't think it has current or future value I close the tab. But, in about 90 minutes I could triage through 1,200 to 1,400 new RSS feed items, get 30 to 70 potential items of value open in tabs in a browser, and get this down to a usual 5 to 12 items of current or future value. Yes, in 90 minutes (keeping focus to sort the out the chaff is essential). But, from this point I would blog or at least put these items into Delicious and/or Ma.gnolia or Yahoo MyWeb 2.0 (this service was insanely amazing and was years ahead of its time and I will write-up its value).
The volume and tools have changed over time. Today the same number of feeds (approximately 400) turn out 500 to 800 new items each day. I now post less to Delicious and opt for DevonThink for 25 to 40 items each day. I stopped using DevonThink (DT) and opted for Yojimbo and then Together.app as they had tagging and I could add my context (I found my own context had more value than DevonThink's contextual relevance engine). But, when DevonThink added tagging it became an optimal service and I added my archives from Together and now use DT a lot.
Relevance of As if Had Read
But, one of the things I have been finding is I can not only search within the content of items in DT, but I can quickly aggregate related items by tag (work projects, long writing projects, etc.). But, its incredible value is how it has changed my information triage and process. I am now taking those 30 to 40 tabs and doing a more in depth read, but only rarely reading the full content, unless it is current value is high or the content is compelling. I am acting on the content more quickly and putting it into DT. When I need to recall information I use the search to find content and then pull related content closer. I not only have the item I was seeking, but have other related content that adds depth and breath to a subject. My own personal recall of the content is enough to start a search that will find what I was seeking with relative ease. But, were I did a deeper skim read in the past I will now do a deeper read of the prime focus. My augmented recall with the brilliance of DevonThink works just as well as if I had read the content deeply the first time.
Social Computing Summit in Miami, Florida in April, 2008
ASIS&T has a new event they are putting on this year, the Social Computing Summit in Miami, Florida on April 10-11, 2008 (a reminder page is up at Yahoo's Upcoming - Social Computing Summit). The event is a single-track event on both days with keynote presentations, panels, and discussion.
The opening keynote is by Nancy Baym. I have been helping assist with organization of the Social Computing Summit and was asked by the other organizers to speak, which I am doing on the second day. The conference is a mix of academic, consumer, and business perspectives across social networking, politics, mobile, developing world, research, enterprise, open social networks (social graph and portable social networks) as well as other subjects. The Summit will be a broad view of the digital social world and the current state of understanding from various leaders in social computing.
There is an open call for posters for the event that closes on February 25, 2008. Please submit as this is looking to be a great event and more perspectives and expertise will only make this event more fantastic.
Does IBM Get Folksonomy?
While I do not aim to be snarky, I often come off that way as I tend to critique and provide criticism to hopefully get the bumps in the road of life (mostly digital life) smoothed out. That said...
Please Understand What You Are Saying
I read an article this morning about IBM bringing clients to Second Life, which is rather interesting. There are two statements made by Lee Dierdorff and Jean-Paul Jacob, one is valuble and the other sinks their credibility as I am not sure they grasp what they actually talking about.
The good comment is the "5D" approach, which combines the 2D world of the web and the 3D world of Second Life to get improved search and relevance. This is worth some thinking about, not a whole lot as the solution as it is mentioned can have severe problems scaling. The solution of a virtual world is lacking where it does not augment our understanding much beyond 2D as it leaves out 4 of the 6 senses (it has visual and audio), and provides more noise into a pure conversation than a video chat with out the sensory benefits of video chat. The added value of augmented intelligence via text interaction is of interest.
I am not really sure that Lee Dierdorff actually gets what he is saying as he shows a complete lack of even partial understanding of what folksonomy is. Jacob states, "The Internet knows almost everything, but tells us almost nothing. When you want to find a Redbook, for instance, it can be very hard to do that search. But the only real way to search in 5D is to put a question to others who can ask others and the answer may or may not come back to you. It's part of social search. Getting information from colleagues (online) -- that's folksonomy." Um, no that is not folksonomy and not remotely close. It is something that stands apart and is socially augmented search that can viably use the diverse structures of a folksonomy to find relevant information, but asking people in a digital world for advise is not folksonomy. It has value and it is how many of us have used tools like Twitter and other social software that helps us keep those near in thought close (see Local InfoCloud). There could be a need for a term/word for that Jacob is talking about, but social search seems to be quite relevant as a term.
Related, I do have a really large stack of criticism for the IMB DogEar product that would improve it greatly. It needs a lot of improvement as a social bookmarking and folksonomy tool, but also from the social software interaction side there are things that really must get fixed for privacy interests in the enterprise before it really could be a viable solution. There are much better alternatives for social bookmarking inside an enterprise other than DogEar, which benefits from being part of the IBM social software stack Lotus Connections as the whole stack is decent together, but none of the parts are great, or even better than good by them self. DogEar really needs to get to a much more solid product quickly as their is a lot of interest now for this type of product, but it is only a viable solution if one is only looking at IBM products for solutions.
Understanding Taxonomy and Folksonmy Together
I deeply appreciate Joshua Porter's link to from his Taxonomies and Tags blog post. This is a discussion I have quite regularly as to the relation and it is in my presentations and workshops and much of my tagging (and social web) training, consulting, and advising focusses on getting smart on understanding the value and downfalls of folksonomy tagging (as well as traditional tagging - remember tagging has been around in commercial products since at least the 1980s). The following is my response in the comments to Josh' post...
Response to Taxonomy and Tags
Josh, thanks for the link. If the world of language were only this simple that this worked consistently. The folksonomy is a killer resource, but it lacks structure, which it crucial to disambiguating terms. There are algorithmic ways of getting close to this end, but they are insanely processor intensive (think days or weeks to churn out this structure). Working from a simple flat taxonomy or faceted system structure can be enabled for a folksonomy to adhere to.
This approach can help augment tags to objects, but it is not great at finding objects by tags as Apple would surface thousands of results and they would need to be narrowed greatly to find what one is seeking.
There was an insanely brilliant tool, RawSugar [(now gone thanks to venture capitalists pulling the plug on a one of a kind product that would be killer in the enterprise market)], that married taxonomy and folksonomy to help derive disambiguation (take appleseed as a tag, to you mean Johnny Appleseed, appleseed as it relates to gardening/farming, cooking, or the anime movie. The folksonomy can help decipher this through co-occurrence of terms, but a smart interface and system is needed to do this. Fortunately the type of system that is needed to do this is something we have, it is a taxonomy. Using a taxonomy will save processor time, and human time through creating an efficient structure.
Recently I have been approached by a small number of companies who implemented social bookmarking tools to develop a folksonomy and found the folksonomy was [initially] far more helpful than they had ever imagined and out paced their taxonomy-based tools by leaps and bounds (mostly because they did not have time or resources to implement an exhaustive taxonomy (I have yet to find an organization that has an exhaustive and emergent taxonomy)). The organizations either let their taxonomist go or did not replace them when they left as they seemed to think they did not need them with the folksonomy running. All was well and good for a while, but as the folksonomy grew the ability to find specific items decreased (it still worked fantastically for people refinding information they had personally tagged). These companies asked, "what tools they would need to start clearing this up?" The answer a person who understands information structure for ease of finding, which is often a taxonomist, and a tool that can aid in information structure, which is often a taxonomy tool.
The folksonomy does many things that are difficult and very costly to do in taxonomies. But taxonomies do things that folksonomies are rather poor at doing. Both need each other.
Complexity Increases as Folksonomies Grow
I am continually finding organizations are thinking the social bookmarking tools and folksonomy are going to be simple and a cure all, but it is much more complicated than that. The social bookmarking tools will really sing for a while, but then things need help and most of the tools out there are not to the point of providing that assistance yet. There are whole toolsets missing for monitoring and analyzing the collective folksonomy. There is also a need for a really good disambiguation tool and approach (particularly now that RawSugar is gone as a viable approach).
Folksonomy Provides 70 Percent More Terms Than Taxonomy
While at the WWW Conference in Banff for the Tagging and Metadata for Social Information Organization Workshop and was chatting with Jennifer Trant about folksonomies validating and identifying gaps in taxonomy. She pointed out that at least 70% of the tags terms people submitted in Steve Museum were not in the taxonomy after cleaning-up the contributions for misspellings and errant terms. The formal paper indicates (linked to in her blog post on the research more steve ... tagger prototype preliminary analysis) the percentage may even be higher, but 70% is a comfortable and conservative number.
Is 70% New Terms from Folksonomy Tagging Normal?
In my discussion with enterprise organizations and other clients that are looking to evaluate their existing tagging services, have been finding 30 percent to nearly 70 percent of the terms used in tagging are not in their taxonomy. One chat with a firm who had just completed updating their taxonomy (second round) for their intranet found the social bookmarking tool on their intranet turned up nearly 45 percent new or unaccounted for terms. This firm knew they were not capturing all possibilities with their taxonomy update, but did not realize their was that large of a gap. In building their taxonomy they had harvested the search terms and had used tools that analyzed all the content on their intranet and offered the terms up. What they found in the folksonomy were common synonyms that were not used in search nor were in their content. They found vernacular, terms that were not official for their organization (sometimes competitors trademarked brand names), emergent terms, and some misunderstandings of what documents were.
In other informal talks these stories are not uncommon. It is not that the taxonomies are poorly done, but vast resources are needed to capture all the variants in traditional ways. A line needs to be drawn somewhere.
Comfort in Not Finding Information
The difference in the taxonomy or other formal categorization structure and what people actually call things (as expressed in bookmarking the item to make it easy to refind the item) is normally above 30 percent. But, what organization is comfortable with that level of inefficiency at the low end? What about 70 percent of an organizations information, documents, and media not being easily found by how people think of it?
I have yet to find any organization, be it enterprise or non-profit that is comfortable with that type of inefficiency on their intranet or internet. The good part is the cost is relatively low for capturing what people actually call things by using a social bookmarking tool or other folksonomy related tool. The analysis and making use of what is found in a folksonomy is the same cost of as building a taxonomy, but a large part of the resource intensive work is done in the folksonomy through data capture. The skills needed to build understanding from a folksonomy will lean a little more on the analytical and quantitative skills side than the traditional taxonomy development. This is due to the volume of information supplied can be orders of magnitude higher than the volume of research using traditional methods.
Ma.Del Tagging Bookmarklet
Out of frustration with Yahoo MyWeb no longer surfaces results in Yahoo Search, which made Yahoo Search much better for me than any other web search. Now that this is no long functioning and there is no response as to if or when it will return I am back to Google and Microsoft Live Search (the relevancy is better for me on many things). But, this change also removes the value of MyWeb and this has me looking back to Ma.gnolia (also am a huge fan of Raw Sugar and their facets, but that is another longer post).
New Tagging Combo Bookmarklet
When I became a fan of MyWeb I used some glue to make Del.icio.us and MyWeb Combo Bookmarklet.
So now I have done the same for Ma.gnolia and del.icio.us with the Ma.Del Marklet (drag to bookmark bar in FireFox and Safari only). This was built using the Ma.gnolia bookmarklet Ma.rker Mini as its base.
Cuban Clocks and Music Long Tail Discovery
The last two trips to San Francisco I have heard a latin version of Coldplay's Clocks on KFOG and it really intrigued me. This last trip I was in the car for four songs and one of them was Coldplay's Clocks by the Cuban All Stars. I have been trying to track this track down since first hearing, but am not having great luck. This continually happens when I listen to KFOG, which is about the only regular radio station I will listen to (I much prefer XM Radio for is lack of advertising and blathering idiots spouting off while playing overplayed songs that have little merit.
What I like about this version of Clocks by the Cuban All Stars (I have seen the dashboard metadata list it as Ibrahim Ferrer, but it has not been described as such by the DJs on KFOG). This is where my music recommendations break. But, some digging on the KFOG website points me to Rhythms Del Mundo as the source (but their Flash site seems horribly broken in all browsers as none of the links work). I have found the album on iTunes, but only a partial listing and none of the physical music store options have this in stock as it is not mainstream enough (how I miss Tower).
This all seems like far more work that should be needed. But, not if one has even slightly long tail musical interests. I had a wonderful discussion along these lines wish Cory from IODA about this and the lack of really good long tail discovery systems.
I use Last.fm to discover new things from friend#039;s lists, but the Last.fm neighbor recommendations seem to only work on more mainstream interests (Pandora really falls off on the long tail for me). Now if KFOG put their play list in KFOG, it would help greatly and I would add them to my friend list (or I could move back home to the San Francisco Bay Area).
System One Takes Information Workflow to a New Level
While at Microlearning Conference 2006 Bruno and Tom demonstrated their System One product. This has to be one of the best knowledge/information tools that I have seen in years. They completely understand simplicity and interaction design and have used it to create an information capture and social software tool for the enterprise. Bruno pointed me to a System One overview screen capture (you do not have to login to get started) that features some of the great elements in System One.
One of the brilliant aspects of System One is their marketing of the product. While it has easily usable wiki elements, heavy AJAX, live search, etc. they do not market these buzzwords, they market the ease of use to capture information (which can become knowledge) and the ease of finding information. The simplicity of the interface and interaction make it one of the best knowledge management tools available. Most knowledge management tools fall down on the information entry perspective. Building tools that are part of your workflow, inclusion of information from those that do not feed the KM tool, is essential and System One is the first tools that I have seen that understands this an delivers a product that proves they get it.
The enterprise social software market is one that is waiting to take off, as there is a very large latent need (that has been repressed by poor tools in the past). System One tool is quite smart as they have built e-mail search, file access, Google live file search (you type in the wiki (you do not need to know it is a wiki) and the terms used are searched in Google to deliver a rather nice contextual search. This built in search solves the Google complexity of building solid narrow search queries, but the person using the system just needs to have the capability to enter information into the screen.
Those of us that are geeks find Google queries a breeze, but regular people do not find it easy to tease out the deeply buried gems of information hidden in Google. Surfacing people who are considered experts, or atleast connectors to experts on subjects is part of the System One tool as well and this is an insanely difficult task in an enterprise.
My only wish was that I worked in an organization that would be large enough to use this tool, or there was a personal version I could use to capture and surface my own information when I am working.
You may recognize System One as the developer of retreivr, the Flickr interactive tool that allows you to draw a simple picture and their tool will find related photos in Flickr based on the drawing's pattern and colors. It is a brilliant tool, but not as smart as their main product.
Microsoft Live Image Search
I have been rather quiet about my trip to Microsoft as part of their Search Champs v.4. This trip was mid-January and I was rather impressed with the what Microsoft showed. The focus was late-stage beta for MS Live products and things that were a little more rough. Last week Expo launched, which is a rather cool classified site along the lines of edgio and Craigslist. Expo did not launch with anything ground breaking, but that could be coming. None-the-less it is refreshing to see this kind of effort and interest coming out of Microsoft.
Live Image Search is a Great Web Interface
One of the products that was stellar and near launch that we saw was Live Image Search (shown with vanderwal - what else). Image search was stellar as it is quite similar to Apple iPhoto with its interface, but built for the web. Take Live Image search for a spin. No really, scroll, mouse over, change the thumbnail size on the fly. It is fast and responsive. I am quite impressed.
Oh, since I am on a Mac, I have been using Firefox/Camino to view Live Image search and it works just as wonderfully as it did in the demos on Windows with IE. I think Microsoft understand that the web is a platform, just like Windows and Mac. Microsoft gets that the web as a platform must work on top of other OS platforms. The web browser is an OS agnostic application and must remain so. Microsoft seems to understand that when building for the web it should work across browsers and OS platforms otherwise it is just developing for an OS, but that is not the web. The proof in this will be when Microsoft releases an Live toolbar for Firefox that has all of the access and functionality of the IE toolbar.
More to Come
I am really waiting for another product to get launched or closer to launch as I really think Microsoft will have a good product there too. It is something that really is of interest to me. It really seemed like the Microsoft people we worked with were really listening to our feedback.
Color my opinion changed toward Microsoft. Not only are they doing things of interest, but they are shipping. They are not only trying to get the web, but they have brought in people who understand and know what direction to head. I went to Microsoft out of curiosity and found something that went against my notions of what they were doing. Microsoft get the web in a similar manner to the way that Yahoo does, it is about people with real problems.
Where is my Mac?
Am I giving up my Mac? No. Hell no. My OS works the way that I work and does not get in my way. I don't spend time swearing at it or messing with it. I do the things I need to do for my job and life using technology to augment that effort. Apple has been doing this for years and I don't want to mess up a very good thing.
Microsoft and the DOJ Data Search Request
Yesterday at the Microsoft Search Champs v4 Microsoft peeled back the layers around their dealings with providing the U.S. Government with data around search. Joshua Porter writes-up U.S. Government request and Microsoft responce. The Microsoft discussion was very open and but was closed to those of us in the room. Late in the day we were told we could openly blog the information and discuss it.
A few of us got together last night to discuss the information and recorded the discussion in a podcast the privacy and Microsoft response to DOJ (MP3 10mb 42 minutes hosted on Alex Barnett server). The podcast is a discussion between:
- Joshua Porter (Search Champs Attendee )
- Chris Pirillo ( Search Champs Attendee )
- Dion Hinchcliffe ( Search Champs Attendee )
- Fred Oliviera ( Search Champs Attendee )
- Alex Barnett ( Microsoft )
- Brady Forrest ( MSN Search Team )
- Myself, Thomas Vander Wal ( Search Champs Attendee )
Robert Scoble was the first to break the news in his blog.
From my personal perspective it was very refreshing to hear Microsoft be open with their thoughts and openly admitting they may have dropped the ball, not in the data they gave (because the data given was not personal data in any shape or form). They openly admitted they need to be a more open citizen of the internet. They have responsibility to be open with the personal information and data, which we as citizens of the web trust those with our digital tracks. There is a compact between the people using tools and the providers of internet tools that our digital rights are protected.
I have a very strong belief that Microsoft is a good citizen that looks out for my privacy. This was a trust I did not think I would have at any point in my life. It is a trust today that I have with them, but it will be a trust they must continue to foster. There are many in the Search Champs that strongly believe all of the search and portal companies must work together to ensure they are consistent in protecting the privacy of the digital citizens that interact with them. There was a lot of Google love that was lost with their public spin to try and drive a wedge between themselves and the other search engines and portals. Google was very good in publicly pointing out the DOJ request and getting public attention on the request. But, Google must work together with Yahoo!, Microsoft, AOL to protect not only digital citizens but their whole industry.
Off to Seattle This Week
I am off to Seattle for much of this week to be part of Microsoft Search Champs v4. It is a rather impressive group of people invited to Search Champs and I am humbled to have been included. I will have just a wee bit of free-time there to see family and friends in the area. I have only let a couple people know I am heading that way as I really do not know a lot about my schedule, other than all of my nights are booked while I am there. I should have Thursday afternoon free and Friday morning.
Seattle is one of the places I grew-up (until early grade school). I have not been back since 1991 or 1992. It is a place that I still miss, although not as much as the Bay Area.
Mobile Search is Not Interested in Mobile
One of the things that has been bugging me for a while is mobile search. I mostly use Google mobile search on my mobiles for search. It is not the interface, but the results that get me.
Mobile search should return mobile site results. I gave Google a huge clue as to my usage, "I am on a mobile device", which they have yet to find as a helpful part of their algorithm. If I search for information I on my mobile I should be able to get the search results pointing to mobile ready content. If not by default, let me set this as a preference (not that I want to with Google as they have this wonderful way of poorly allowing me to manage my identity (there is no way to manage your own identity on Google).
I would love to have mobile search engine give me mobile sites. Why? Many sites have moved to flooding their pages with rich interfaces (AJAX and Flash) for no value added to the customer. This turns a 25kb or even a (formerly large page) 60kb into a 200kb or even a 450kb page. Much of this added interface is of little value other than it is cute or cool on a desktop, but on a mobile device it make the page and the information on it inaccessible.
Myself and many people I talk with who use mobile search often have not tucked the information we want into our bookmarks or sent it to ourselves for easy access. I know what site had the information I am seeking or what site I would like to have inform me while I have a little downtime away from home or the office.
MyWeb 2 Grows Up Quickly into a Usable Tool
Earlier this week I chose to use Yahoo! search rather than the default Google that I usually use. The search page on Yahoo! had sponsored links at the top of the page, but then a few other offerings followed by the usual offerings. The second set was dead on what I was seeking. What were these second set of links? They were the results of those in "My Community" in MyWeb 2 Search, which is similar to del.icio.us in that it is a social bookmarking tool with tagging.
This discovery from a community of less than 40 people really surprised me. Of those 40 people less than 15 have more than 5 pages they have bookmarked, but this community is one I share interests and vocabulary. I was partial shocked with amazement as when MyWeb 2 launched in beta a few weeks ago (or a few months at this point) I was completely under whelmed as most of the links in MyWeb 2 were for things I not only had not interest in, but did not care to have recommended.
As the net effect of more people adding their bookmarks to this socially shared tool grew the value of the tool increased. As it grows I am positive it the aspects of my community will need to get more fine grained so I can say I like the tags from person X (similar to the granular social network which would make better use of the social network for recommender systems that actually could be used and trusted). One of the benefits of MyWeb 2 is that it gets layered on top of Yahoo's search results, which is a great place for this information.
I would love to replicate my del.icio.us bookmarks and tags into MyWeb 2 at Yahoo. The next step would be to feed both systems at the same time from one central interface. There are things in del.icio.us that I really like, but the layering of the social bookmarking and with tagging on top of other tools adds greater value to the user.
Amazon and A9 Provide Yellow Pages with Photos
Everybody is talking about Amazon's (A9) Yellow Pages today. Amazon has done a decent job bringing photos into their Yellow Pages for city blocks. This is a nice touch, but it is missing some interaction and interconnections between the photos and the addresses, I hope this will come. I really would like to be able to click on a photo and have the Yellow Pages information show up, everything I tried on Clement Street in San Francisco, California did not work that way.
One of the things that really hit me in playing with the tool today at lunch was how the Yellow Pages still suck. I have had problems with the Yellow Pages for..., well ever. I grew up in cross-cultural environments with British and French influences in my day-time care givers growing up. I moved around a fair amount (up and down the West Coast growing up and Europe and the U.S. East Coast). Culture has their own vocabulary (let alone language) for the same items. What I call things, depends on context, but no matter what, the Yellow Pages do not match what I wish to call what I want (or sometimes need).
Today's search I used one of the Amazon search sample, "Optica", which had some nice references. Knowing how I usually approach using the Yellow Pages I search for glasses (as that is what I need to get or need repaired) or contacts. Doing this in a paper Yellow Pages usually returned nothing or pointers to a couple other places. One would thing online Yellow Pages to be different, well they are, they returned nothing related. Glasses returns restaurant supply and automotive window repairs with not one link to eye glasses, nor a reference to "you may be looking for...".
A9 is a great search tool and Amazon.com has great product tools and incredible predictability algorithms, which will be very helpful down the road for the Personal InfoCloud, but the current implementation is still a little rough. I can see where they are heading with this. And I can dream that I would have this available for a mobile device at some point in the next two or three years.
Once very nice piece that was integrated was reviews and ratings of Yellow Pages entries. This is great for the future, once they get filled out. It will also be great once it is available from mobile device (open API so we can start building a useful tool now?). But, it brings my scenario of the future to light rather quickly, where I am standing in front of a restaurant looking at over 100 restaurant reviews on my mobile device. There is no way that I can get through all of these reviews. Our supporting full complement of context tools will be needed to get pulled into play to get me a couple or four good reviews that will mean something to me.
This is but a small slice of the Personal InfoCloud, which is much broader and focusses on enabling the person to leverage the information they have and find. Pairing these two and enabling easy access to that information when it is needed.
Fix Your Titles for Better Search and Use
Lose the ego already! Since I have been using del.icio.us I have been noticing how backwards so many site's header titles are these days. The header title should be specific to general information.
You are saying "huh?" Okay, take CNN who uses the header title like <title>CNN.com - Dog Bites Man</title>. The better way is <title>Dog Bites Man - CNN.com</title>.
Why? Search engines, browser headers, and bookmarks are why. Search engines use the words to give preference and the words closer to the beginning have higher preference. A browser header will only show the first so many letters (depending on the browser and how wide the browser window is open). Lastly the title is used in browser bookmarks. If a person has four bookmarks to items in a site they would see the site name and then the bit that is important to the user.
Now look at the pages you build are they built for search engines and for people to actually use and come back to? It may be your site management tools that have mangled your titles and they need to be fixed, but they will not be fixed if you do not ask. The other reason titles are broken is because somebody who does not understand the web want only to have their ego stroked, but they made their information less valuable by doing so.
Flickr and the Future of the Internet
Peter's post on Flickr Wondering triggers some thoughts that have been gelling for a while, not only about what is good about Flickr, but what is missing on the internet as we try to move forward to mobile use, building for the Personal InfoCloud (allowing the user to better keep information the like attracted to them and find related information), and embracing Ubicomp. What follows is my response to Peter's posting, which I posted here so I could keep better track of it. E-mail feedback is welcome. Enjoy...
You seemed to have hit on the right blend of ideas to bring together. It is Lane's picture component and it is Nadav's integration of play. Flickr is a wonderfully written interactive tool that adds to photo managing and photo sharing in ways that are very easy and seemingly intuitive. The navigations is wonderful (although there are a few tweak that could put it over the top) and the integration of presentational elements (HTML and Flash) is probably the best on the web as they really seem to be the first to understand how to use which tools for what each does best. This leads to an interface that seems quick and responsive and works wonderfully in the hands of many. It does not function perfectly across platforms, yet, but using the open API it is completely possible that it can and will be done in short order. Imagine pulling your favorites or your own gallery onto your mobile device to show to others or just entertain yourself.
Flickr not only has done this phenomenally well, but may have tipped the scales in a couple of areas that are important for the web to move forward. One area is an easy tool to extract a person's vocabulary for what they call things. The other is a social network that makes sense.
First, the easy tool for people to add metadata in their own vocabulary for objects. One of the hinderances of digital environments is the lack of tools to find objects that do not contain words the people seeking them need to make the connection to that object they are desiring. Photos, movies, and audio files have no or limited inherent properties for text searching nor associated metadata. Flickr provides a tool that does this easily, but more importantly shows the importance of the addition of metadata as part of the benefit of the product, which seems to provide incentive to add metadata. Flickr is not the first to go down this path, but it does it in a manner that is light years ahead of nearly all that came before it. The only tools that have come close is HTML and Hyperlinks pointing to these objects, which is not as easy nor intuitive for normal folks as is Flickr. The web moving forward needs to leverage metadata tools that add text addressable means of finding objects.
Second, is the social network. This is a secondary draw to Flickr for many, but it is one that really seems to keep people coming back. It has a high level of attraction for people. Part of this is Flickr actually has a stated reason for being (web-based photo sharing and photo organizing tool), which few of the other social network tools really have (other than Amazon's shared Wish Lists and Linkedin). Flickr has modern life need solved with the ability to store, manage, access, and selectively share ones digital assets (there are many life needs and very few products aim to provide a solution for these life needs or aims to provide such ease of use). The social network component is extremely valuable. I am not sure that Flickr is the best, nor are they the first, but they have made it an easy added value.
Why is social network important? Helping to reduct the coming stench of information that is resultant of the over abundance of information in our digital flow. Sifting through the voluminous seas of bytes needs tools that provide some sorting using predictive methods. Amazon's ratings and that matching to other's similar patterns as well as those we claim as our friends, family, mentors, etc. will be very important in helping tools predict which information gets our initial attention.
As physical space gets annotated with digital layers we will need some means of quickly sorting through the pile of bytes at the location to get a handful that we can skim through. What better tool than one that leverages our social networks. These networks much get much better than they are currently, possibly using broader categories or tags for our personal relationships as well as means of better ranking extended relationships of others as with some people we consider friends we do not have to go far in their group of friends before we run into those who we really do not want to consider relevant in our life structures.
Flickr is showing itself to be a popular tool that has the right elements in place and the right elements done well (or at least well enough) to begin to show the way through the next steps of the web. Flickr is well designed on many levels and hopefully will not only reap the rewards, but also provide inspiration to guide more web-based tools to start getting things right.
Fixing Permalink to Mean Something
This has been a very busy week and this weekend it continues with the same. But, I took two minutes to see if I could solve a tiny problem bugging me. I get links to the main blog, Off the Top, from outside search engines and aggregators (Technorati, etc.) that are referencing content in specific entries, but not all of those entries live on the ever-changing blog home page. All of the entries had the same link to their permanant location. The dumb thing was every link to their permanant home was named the same damn thing, "permalink". Google and other search engines use the information in the link name to give value to the page being linked to. Did I help the cause? No.
So now every permanent link states "permalink for: incert entry title". I am hoping this will help solve the problem. I will modify the other pages most likely next week sometime (it is only a two minute fix) as I am toast.
Gmail Simplifies Email
Since I have been playing with Gmail I have been greatly enjoying the greatly improved means of labeling and archiving of e-mail as opposed to throwing them in folders. Many e-mails are hard to singularly classify with one label that folders force us to use. The ability to drive the sorting of e-mail by label that allows the e-mail to sit accessibly under a filter named with the label make things much easier. An e-mail discussing CSS, XHTML, and IA for two different projects now can be easily accessed under a filter for each of these five attributes.
Dan Brown has written a wonderful article The Information Architecture of Email that dig a little deeper. Dan ponders if users will adopt the changed interface. Hearing many user frustrations with e-mail buried in their Outlook or other e-mail application, I think the improved interface may draw quite a bit of interest. As Apple is going this way for its file structure in Tiger (the next OS upgrade) with Spotlight it seems Gmail is a peak at the future and a good means to start thinking about easier to find information that the use can actually manage.
Future of Local Search on Mac
One of the best things I found to come out of the Apple WWDC keynote preview of the next update of the OS X line, Tiger, Spotlight. Spotlight is the OS file search application. Not only does Spotlight search the file name, file contents (in applications where applicable), but in the metadata. This really is going to be wonderful for me. I, as a user, can set a project name in the metadata and then I can group files from that point. I can also set a term, like "synch" and use AppleScript and Search to batch the files together for synching with mobile devices, easily. Another nice feature is the searches can be saved and stored as a dynamic folder. This provides better control of my Personal InfoCloud.
Steven Johnson provides the history of search in Apple, which has nearly the same technology in Cosmo slated for release in 1996.
Amazon Offers Alexa Augmented Search
Adam pointed out that Amazon is offering a Web search engine A9, which uses ancillary information from Alexa. I offer vanderwal.net/random/ as your jumping off point to explore (leave a review if you wish). I am please with the related sites that are offered as similar sites, not that I am trying for anything in particular.
I agree with Adam that Amazon is offering intriguing integration of information and services, which is the position Google is working to fill. Some of the personal portal sites, like Yahoo, more so than MSN or AOL, have done a good job at innovating in this space.
Google is not my only search engine
Google has been letting me down lately. The past two months I have had too many irrelevant links or only a handful (when I narrow the terms) that do not have what I am looking for. Oddly I have Googled only my site and found the results where I mentioned what I was seeking.
I have been turning more and more to Vivisimo and DogPile for search instead. Why? Well they are both metaseach tools, Vivismo includes Google in what it searches, that search across multiple search engines and return them in one interface. These two services also have faceted filtering and/or categorical filters for the results. These facets greatly help filter out the junk. In short it solves the Paris Hilton site problem when you want a hotel room not a bimbo.
In the past I have tried Vivismo, but it did not seem to have enough depth, which has now been solved. Dogpile now offers a good breadth of search engines that seem to improve on the limited results I had been getting in the past. It is good to have options.
Keeping the Found Things Found
This weeks New York Times Circuits article: Now Where Was I? New Ways to Revisit Web Sites, which covers the Keep the Found Things Found research project at University of Washington. The program is summarized:
The classic problem of information retrieval, simply put, is to help people find the relatively small number of things they are looking for (books, articles, web pages, CDs, etc.) from a very large set of possibilities. This classic problem has been studied in many variations and has been addressed through a rich diversity of information retrieval tools and techniques.
This topic is at the heart of the Personal Information Cloud. How does a person keep the information they found attracted to themselves once they found that information. Keeping the found information at hand to use when the case to use the information arises is a regular struggle. The Personal Information Cloud is the rough cloud of information that follows the user. Users have spent much time and effort to draw information they desire close to themselves (Model of Attraction). Once they have the information, is the information in a format that is easy for the user or consumer of the information to use or even reuse.
iPIM and Chandler have a chair at the Personal Info Cloud
There are two articles that are direct hits on managing information for the individual and allowing the individual to use the information when they needed it and share it as needed. Yes, this is in line with the Personal Information Cloud.
The first article, The inter-personal information manager (iPim) by Mark Sigal about the problem with users finding information and how the can or should be able to then manage that information. There are many problems with applications (as well as the information format itself) that inhibit users reuse of information. In the comments of the article there is a link to products that are moving forward with information clients, which also fit into the Personal Information Cloud or iPIM concept. (The Personal Information Cloud tools should be easily portable or mobile device enabled or have the ability to be retrieved from anywhere sent to any device.
The second article is from the MIT Technology Review (registration required) titled Trash Your Desktop about Mitch Kapor (of founding Lotus Development fame) and his Open Source project to build Chandler. Chandler is not only a personal information manager (PIM), but the tool is a general information manager that is contextually aware. The article not only focusses on Mitch and the product (due late 2004), but the open and honest development practices of those that are building Chandler at the Open Source Application Foundation for Windows, Mac, Linux, etc. distribution.
Blogs get higher Google rankings thanks to proper HTML
Matt points out Google ranks blogs highly. This seems to be the result of Google giving strong preference to titles and other HTML elements. Tools like TypePad help the user properly develop their pages, which Google deems highly credible.
Matt's complaint is his very helpful PVR blog is turning up top results in searches for Tivo information, and other recorder info. Matt's site is relatively new and out ranking the information he is discussing.
This is something I personally run into as things I write about here often get higher Google ranking than the information I am pointing to and is the source and focus of the information. I have often had top Google ranks for items that are big news on CNN or the New York Times, which I am pointing to in my posts.
Much of the reason for this seems to be understanding proper HTML uses and not putting my branding at the forefront of the message. CNN puts their name first in the title of their pages (not the headers, which also have benefit if they are in "H&" tags). The tools and people building Web pages with attention to proper naming and labeling will get rewarded for their good work (if a top Google rank is a reward).
I have written on this in the past in Using HTML tags properly to help external search results from April, which mostly focussed on search ignoring Flash, but for the few HTML elements on a page wrapping the Flash. Fortunately there have been enough links pointing to the site that was laking the top rank to raise the site to the top Google rank.
Some of the corrected Google ranking will come over time as more sites begin to properly mark-up their content. The Google ranks will also shift as more links are processed by Google and their external linking weighting assists correcting the rankings.
Using HTML tags properly to help external search results
There are some essentials to building Web pages that get found with external search engines. Understanding the tags in HTML and how they are (rather should be) used is important. The main tags for most popular search engines are the title, heading (h1, h2, etc), paragraph (p), and anchor (a). Different search engines have given some weight in their ranking to metatags, but most do not use them or have decreased their value.
Google gives a lot of weight to the title tag, which is often what shows in the link Google gives its user to click for the entry. In the title tag the wording is important too, as the most specific information should be toward the front. A user searching for news may find a weblog toward the top of the search ahead of CNN, as CNN puts its name ahead of the title of the article. A title should echo the contents of the page as that will help the ranking of the pages, titles that are not repeated can get flagged for removal from search engines.
The headings help echo what is in the title and provide breaking points in the document. Headings not only help the user scan the page easily, but also are used by search engines to ensure the page is what it states it is. The echoing of terms are used to move an entry to the top of the rankings as the mechanical search engines get reinforcement that the information is on target for what its users may be seeking.
The paragraph tags also are used to help reinforce the text within them.
The anchor tags are used for links and this is what the search engines use to scrape and find other Web pages. The text used for the links is used by the search engines to weight their rankings also. If you want users to find information deep in your site put a short clear description between the anchor tags. The W3C standards include the ability to use a title attribute which some search tools also use. The title attribute is also used by some site readers (used by those with visual difficulties and those who want their information read aloud to them, because they may be driving or have their hands otherwise occupied) to replace the information between the anchor tags or to augment that information.
Example
The application I built to manage this weblog section is build to use each of these elements. This often results in high rankings in Google (and relatedly Yahoo), but this is not the intent, I am just a like fussy in that area. It gets to be very odd when my posting weblog posting review of a meal at Ten Penh is at the top or near the top of a Google Ten Penh search. The link for the Ten Penh restaurant is near the bottom of the first page.
Why is the restaurant not the top link? There are a few possible reasons. The restaurant page has its name at "tenpenh" in the title tag, which is very odd or sloppy. The page does not contain a heading tag nor a paragraph tag as the site is built with Flash. The semantic structure in Flash, for those search engines that scrape Flash. Equally the internal page links are not read by a search engine as they are in Flash also. A norm for many sites is having the logo of the site in the upper left corner clickable to the home page of the site, which with the use of the alt attribute in a image tag within an anchor link allow for each page to add value to the home page rant (if the alt attritute would have "Ten Penh Home" for example).
Not only does Flash hinder the scapeing of information the use of JavaScript links wipes out those as means to increase search rankings. Pages with dynamic links that are often believed to ease browsing (which may or may not prove the case depending on the site's users and the site goals in actual user testing) hurt the information in the site for being found by external search engines. JavaScript is not scrapable for links or text written out by JavaScript.
Go back
I had an early preview of a site this past week so to add comments. It is odd to me that sites are still being built with the frame of reference that the user will come through the "font door". If you read your log files the users come in at every opening. It is about even odds that a new user to the site will come there from a search engine, an external link, or from another pointer (e-mail or article). The frame of reference should always try to provide some orientation to the user, such as breadcrumbs or some other link out to related or parent information.
The item that I found a little jarring was a "Go back to the previous page" and it was not a javascript, but a link to what the developer thought was a next level up page. Pure linear navigation is a practice that is no longer a practice, if it ever was. Somebody last night at the DC-IA book club asked whether we navigated or searched, as always it seems to depend. With sites like Amazon we mostly searched, while some smaller sites we would click around. It seemed the greater volume of information lead to a greater instance of searching.
We did not talk about this for long, but it has been resonating all day. One of the things that Amazon does extremely well is end-search navigation. Most folks seem to search Amazon to find a particular item, but then Amazon's navigation and related offerings that could attract the user to the item, which they were searching for or to a similar item. The search result pages offer links to narrow the results or to ensure the user is looking for the musician Paul Young or author Paul Young. A user arriving at an Amazon book page would have all the options and information they needed to find related information and where they are in the Amazon site.
Google for your enterprise
I went to a small meeting at work today with some folks from Google who were showing their Google Appliance, which was very impressive. Having the Google search generating the search for your enterprise/organization's site would be great, but it got much better than just that bonus. The Google Appliance has the ability to augment the search with a thesaurus to offer the user the option of adding "personal safety restraint devices" when they searched for "seatbelt". This functionality works similarly to Google's spelling corrections.
The advantages did not stop with Google's great search engine, but it also comes with Google's hardware that they have specified and built with failover (if buying more than one rackmounted hardware piece). This just rocks, a software company that is responsible not only for their software, but the hardware it runs on. Apple has had success with this combination and Google's systems are renoun for their great uptime and their ability to return results very quickly. Google boasts having the hardware and software up and configured in one day (when is the last time you have seen this happen, nearly all other search engines are in the 10 to 15 day range). Color me impressed with this demo and seemingly end-to-end search hardware and software package. Google search that can be augmented to provide additional assistance to users, which could let IA's focus on providing great navigational structures for the folks that do not always search to find their information.
Improving information retrieval
Lou points to Improving Web Retrieval After the Dot-Bomb then provides a guide to information retrieval that augments the Marcia Bates article. This provides a very good combination for understanding classification systems.Findability explained
Peter Morville finally puts his findability explanation in writing for all to see (in the wonderful site called Boxes and Arrows). The idea of the term and meaning of findability is growing on me. Findability is a solid lead into the problems of information structure. The explanation of how to start fixing the problems and actions needed to help eradicate the problem can reside in the method/model of attraction (an update to the MOA should be available in two or three weeks, extenuating circumstances have slowed the updates and progress).The Hoopla saga has me trying to move from NetSol. Moving the contact information two years ago was a pain in the butt. My favorite part of this thread is MS and VeriSign (parent of NS) joining to provide better security, what a crock.
USC Annenberg School offers a light personal review of the WSJ redesign. Those of us that use the online version of the Journal on a daily basis have noticed a great jump since the redesign began implementation over a month ago. The site is much quicker and the interface is cleaner. The queries now are very quick again and there is a deep pile of data/information to search through.
Snippets: I have noted the redesign more than once... Nihal ElRayess has shared part of the IA perspective on the main WSJ redesign and the WSJ Company Research redesign parts of the project... The Guardian provided its insight in February (a good piece of researched journalism)... It looks like the WSJ redesign began in at least March 2000... The $28 million spent on the Web reworking (hardware, software, visual, and information architecture) is much less than the $232 million spent on a new printer for the WSJ print version or the $21 million for an advertising campaign to tout the new WSJ... The previous version of the WSJ site was a hand rolled CMS and now have been moved into Vignette... Those interested in the overal WSJ plan will like what is inside the presentation of Richard Zannino, Executive Vice President and CFO of Dow Jones & Company.
Internet Archive a information mess
The Chron focusses on the lack of organization of the Internet Archive. This would be a dream to organize for some folks I know (or at least I think it would be). The problems at hand for this project rule out library science approach (too much human touch needed) and search engines as their design is not conducive. A great read to get the wheels turning.The U.S. Government relaunched their central information sourcse today. First Gov is a much better resource this go around. The site is very quick and serves information based on queries very quickly. Now I really want to know more about how it was put together. The site seems to quickly and easily get the user to information. Parsing through the amount of information that is available on government sites is a daunting task. Now if am very curious.
Google shares its 10 things they found to be true, which starts off with, "focus on the user and all else will follow". There are many other truths in this list. [hat tip eleganthack and Digital-Web New]
Web Designers should stop relying on search to cover for poor IA and design, to paraphrase PC World's presentation of User Interface Engineering's (UIE) latest research. This states 77 percent of the users do not find what they are looking for through search. The article does list some pitfalls that the user can fall into (poor spelling on the site, etc.), but with great depth of information and users often looking for specific information search could be a solid option, but this takes some work.
One navigation method that I find less and less is offering similar links based on what the user has clicked to. Often I would like to read the archives of a regular columnist in a magazine. I should not have to search to find the archives as that method often provide chaff with the goal of my search. Storage and metadata can greatly assist the navigation approach.
I personally find navigation and search combinations on a site create a higher probability that I will find the information that I am searching for.