A Proposal on Organization of Information System (另见:中文版)
Authored by Hui
Zheng on May 18, 2008
1. Introduction
We
are living in an age of information, but sometimes information imposes more
burden than benefit. From a user’s view, most information systems including
file systems, mail systems and various menu-based systems are essentially
organized in hierarchical structure. As information increases, yet the usability
of the sytems decreases. The major flaw of this kind of structure is that it
only provides a single path to the target information. If a user misses
one corner, he will possibly lose his
way. This situation could be improved if an information system supports multipath
routing. Aiming at this, this article proposes a practical solution by
borrowing some ideas from Gmail system.
2.
Information retrieval problem
Information
itself is great, but storing and retrieving information sucks. Day by day, a
typical computer user browses and saves web pages, collects favorite bookmarks
and RSS, downloads files from BT or emule, composes and receives email, writes
documents or programs etc. He enjoys all of these until someday, he finds
himself gradually suffering information overload. As an evidence, he now and
then feels a little bit dizzy: his desktop is terribly messy with miscellaneous
icons packed like sardines, his bookmark menu pulls all the way down like a
huge blanket and his inbox is cluttered with mails like a bulging bag. He comes
to realize that if this situation cannot be changed, his brains must explode
before his hard disk or mailbox does. Thereafter, He cultivates the habit of
organizing files, bookmarks and mails into hierarchical folders. As a result,
things improve a lot. Unfortunately, good times don't last long. He finds that
as his documents grow rapidly, his folders become more and more, deeper and
deeper. It takes some time for him to save a document to appropriate position,
and it does even more time to find a document he downloads or composes. He
tends to get tired, vexed and somewhat lost when he navigates the hierarchical
trees. He knows he possesses a gigantic Christmas tree with tremendous gifts
hanging on, but few of them are handy. Time and time again, he fails to find an
important document after exhaustive search. He doubts his memory, and
occasionally, he doubts machine's memory. Although he knows those missing
stuffs will never automatically jump upon to him, he still cannot help yelling
at the machine: where the hell are the damn documents hiding? From time to time, he
slips back to the old habit: saving all recent files to desktop, just for
better convenience and confidence. So, what is the root of the plague?
3. Gmail's
solution
It
turns out that the evil root is the traditional manner of information
organization, namely, tree (or forest) structure. This hierarchical structure
is reasonable for large but not huge quantities of information items. As the
information volume swells, the tree structure becomes unmanageable little by
little. The item lists in many folders are inevitably long and some folders are
deeply nested. In file systems, this symptom can be alleviated to some degree
by creating shortcuts in Windows or symbolic links in Unix-family OS. But that
is not a final cure. As an interesting alternative, Google's Gmail presents what
they call "label". A label is basically a tag which could be applied to a message together with other ones. Many users complain about it at the
beginning because they are used to old folder fashion. But the complaints are
waning as the time passes by. The users find that their messages are no longer like guerilla hiding in deep forest, instead, they are like regular army in one-line arrangement waiting for inspection. The most recent messages are on the top to access, which is impossible in the
well-organized folder system; they won't be bewildered where to sort the
messages since they can apply as many labels as they'd like to the messages;
finding a specific message is easy too: users can filter by user-defined
labels, or system-defined labels like inbox, sent, star, chat, trash etc. They
can also search by sender, receiver, subject and message content. Even better, users
can define filters that automatically apply labels to the incoming mails. This
solution, henceforth we call tag structure, is not necessarily limited
to mail management system, it should apply well to file system and other
information systems such as knowledge management system.
4. Our
solution
However,
tag structure doesn't always suffice for our needs. Even though tags are much
fewer than information items, they still can overflow. In Gmail's tag
structure, all user-defined labels are independent and equal, but as a matter
of fact, they are very likely different in their importance, urgency,
popularity; some labels have inherent relation; the labels for a given
information item vary in correlation. For example, labels like "work"
or "family" are more important; labels like "todo" or
"exam" are more urgent; labels like "sports" or “film” are
more popular if the user is a sports or film fan. It's also desirable that
after a programmer user labels some materials as "Java" or
"C++", those materials can be automatically labeled as
"programming language" and “OOP” such that he can later get all
programming language-related items or OOP items in one list. Lastly, among all
labels for a given information item, one may be more correlative than the others.
Taking all of these into consideration, we propose a feasible solution as
follows.
- Introduce hierarchical
structure into tag structure. That is, we treat tags/labels as
metadata of information, and organize them in the traditional tree
structure. This way we combine two worlds' best parts together. Actually,
we can go further. As we know, hierarchical tree structure is a directed
tree in graph theory, but we may generalize the tag structure to digraph
as long as it makes sense. This will allow a tag have more than one
parents, something like multiple inheritance in some OOP languages.
Obviously the gmail's tag structure is a special case of our structure
when all labels are the roots(i.e. those having no sublabels).
- Introduce weight of
importance, emergency and popularity for each tag so that tags are
sortable by any of these respects. Gmail's star label can serve as this
purpose, but it's too coarse-grained. The popularity weight of a tag can
be chosen to be auto-incremented by each visit of the tag, which ensures
the most frequently used tags are always on the top. Besides, tags can be
sorted by most recent visited time. Consequently, users will have more
confidence that documents they really care are available to fetch, and
accessing any interesting, active and important items in the system is
just a piece of cake.
- Introduce main label,
i.e. one of an item's labels can be specified as the main label. In this
sense, the traditional tree structure can be viewed as a special case of
our structure: any folder name is exactly a label name (There is one
subtle difference, though: unlike label names, same folder names in
different path wouldn't clash). If the main label's correlativity is 1,
other labels' should be between 0 and 1. This provides extra search and
sort criteria.
- Introduce alias tag.
Tags are allowed to have more than one names, these names can be
abbreviations, synonyms, or even in different languages. Furthermore,
alias can be more powerful: users may define a label as the logical
combination of existing labels. For example, one can define
"myPrograms" as "'my documents' and
'programs'", define "entertainment" as "sports or
novel or movie" etc.
- Introduce thread.
Users can create thread that link related message items. Gmail has
conversation, but it doesn't allow users to union mails by themselves. The
thread is good for follow-ups and different document revisions. This
aggregation makes information system more compact and coherent.
5. Conclusion
To
locate an information item, users need click folders to expand in a
hierarchical system, while they need click labels to filter in a tag system. We
don't mention search because search is slow and information content may be in
binary form. The tag structure is conceivably a better solution, but it still
has shortcomings. To further facilitate information retrieval, we've proposed a
weighted diagraph tag structure, which is an improved tag structure
integrating advantages of tree structure. An information system featuring this
structure should be more accessible and enjoyable, and its users could be like
pisciculturists, no matter how many fishes they have thrown into the
ponds, any fish they desire will swim to them with waggly tail upon a
single whistle.