Putting documents into their work context in document analysis
A. Salminen , , V. Lyytikäinen and P. Tiitinen
Department of Computer Science and Information Systems, University of Jyväskylä, PO Box 35 (MaE), FIN-40351 Jyväskylä, Finland
Received 10 September 1999; accepted 9 November 1999. Available online 18 April 2000.
Abstract
In trying to achieve document standardization the goal is to find more effective, consistent, and standardized ways to utilize information technology. The specification and implementation of document standards may take several years requiring a profound analysis and understanding of document management practices. Document standardization does not concern documents only: it concerns workers, their work, business partners, and future systems as well. In this paper we discuss two ways of describing the work context of documents: process modelling and life cycle modelling. In process modelling, documents are regarded as resources produced and used in inter- or intra-organizational business processes. Different types of documents are typically produced and used in a business process. In life cycle modelling work related to processing of a document of a specific type is described. The modelling methods have been tested in an SGML standardization project called RASKE during the analysis of four case domains: the enquiry process in the Finnish Parliament and Government, national Finnish legislative work, budgetary work, and the Finnish participation in EU legislative work. This paper discusses the modelling requirements in document analysis and describes the techniques used in the RASKE project.
Author Keywords: Document analysis; Document standardization; Process modelling; SGML; XML
1. Introduction
The data volume in the electronic document repositories of organizations is growing fast, but the diversity of the document formats and systems, as well as continuing changes in the information technology, cause problems in the access and use of the information needed in work tasks. The problems concern both companies and public sector organizations. These problems have prompted organizations to start major document standardization projects where the intention is to agree upon rules which define the way information is represented in documents. The rules are needed in order to achieve more effective, consistent, and stable ways to utilize information technology in business processes. Problems with technological changes, and in the maintenance of long-term access to digital documents have motivated the search for application independent formats for documents. SGML (Standard Generalized Markup Language) is an international standard for defining and representing documents in an application-independent form (Goldfarb, 1990). A subset of SGML called XML (Extensible Markup Language) has been developed especially for specifying document standards to be used in Web information systems ( Bray, Paoli & Sperberg-McQueen, 1998).
In SGML/XML standardization projects, a profound document analysis is needed. The analysis is usually seen as an analysis of document structures (Travis; Watson and Maler, Magnusson Sjöberg, 1997, Weitz, 1998). Successful implementation of document standards in enterprises however requires understanding of the role of documents in work processes. Especially in cases where the standardization concerns several document types and the document production is part of inter-organizational business processes, the analysts as well as the actors in processes should be able to see the process context of documents. In this paper we discuss the work process modelling as part of document analysis. We will introduce the modelling techniques used in a major standardization project called RASKE where the standardization has concerned the documents created in the Finnish Parliament and ministries ( Salminen; Salminen and Salminen).
The rest of the paper is organized as follows. Section 2 introduces a model for electronic document management environments and defines the notions related to the model. Document standardization of enterprises is discussed in Section 3. As an example of a standardization project the RASKE project is introduced. Work process modelling approaches in other application areas and needs in the document analysis of a document standardization project are discussed in Section 4. The techniques used in the RASKE project are described in Section 5. Experiences and implications from the RASKE project are discussed in Section 6.
2. Electronic document management environments
Organizations use documents as a means for information management: a means to cluster, organize, store, transfer, and use information to fulfill their organizational purposes. The term electronic document management (EDM) refers to the use of modern information technology for the purpose (Sprague, 1995). In document standardization it is important to identify, not only documents and their structures, but also other entities of the EDM environment where the documents are created, manipulated, and used.
Fig. 1 shows a model for an EDM environment using the central notions of information control nets (ICNs): activities and resources (Ellis, 1979). Information is produced and used in activities. The resources are information repositories where information produced can be stored, or from where information can be taken. The dashed lines in the figure denote the information flow from and to resources. The set of activities is denoted by a circle and the resources by rectangles. The resources are divided into three types: documents, systems, and actors. Documents consist of the recorded data intended for human perception. A document can be identified and handled as a unit in the activities, and it is intended to be understood as information pertaining to topic. Since the documents in an EDM environment are mostly digital, it means that information technology is needed and utilized to operate on documents. Hence systems, i.e. hardware, software, and applications, are essential resources in an EDM environment. On the other hand, since the information in documents should be available also after system changes, it is also important to separate the documents from systems as resources. Finally, the actors are people and organizations performing activities and using documents as well as systems in the activities. In some fully automated activities a software system may perform an activity (for example, create an email message and send it to a repository). In this paper we will however consider activities where the actors creating and using documents are people and organizations. In relationship to documents and systems, actors are called users. Actors are grouped by roles. A role specifies the tasks, responsibilities, and rights of an actor in an activity, as a user of a system, or as a user of a document repository.
(3K)
Fig. 1. Components of an electronic document management environment.
Information pieces needed and produced during an activity are stored in many different ways: in the heads and experience of people, in the organizational culture, as hardware and software solutions, and as data in documents and applications. If the notion of information is understood according to the sense-making theory of Dervin (1992) as ‘the sense created in a situation, at a specific moment in time and space by a reader’ (where Dervin means a human reader), then information is subjective and the information needed by a person in order to perform an activity may be a complicated combination of pieces coming from different sources.
An EDM environment may be in a single organization. In the current networked world however, business processes often concern several organizations and resources are shared more or less by those organizations. Thus the EDM environments in which a specific organization or person is involved may be quite complex.
3. Document standardization
One of the approaches for improving business processes is document standardization using application-independent standard formats. In the standardization the idea is to plan digital information structures and formats taking into account future changes in systems instead of planning them for a specific software system. The rules associated with a document, document authoring, and its storage format are intended to help consistent understanding of the content by the authors and different readers also in situations where the software and hardware changes. Sprague (1995) suggests the development of an electronic document management strategy in an organization. Standardization can be taken as such a strategy.
3.1. RASKE as a standardization project
One example of a standardization project is RASKE. The term RASKE comes from the Finnish words ‘Rakenteisten AsiakirjaStandardien KEhittäminen’ meaning the development of standards for structured documents. The project was commenced in spring 1994 by the Finnish Parliament and a software company in cooperation with researchers at the University of Jyväskylä. The Ministry of Foreign Affairs, Ministry of Finance, Prime Minister’s Office, and a publishing house also participated in the project.
Starting the RASKE project was motivated by document management problems in the Finnish Parliament and government. Teams studying the legislative work carried out in Parliament identified, for example, the following problems concerning document management (Salminen et al., 1997):
1. Incompatibilities of the systems used caused the need for repeated typing of the same piece of text, which in turn was a potential source of inconsistencies in documents.
2. Inconsistencies in document naming and document identifiers caused problems and extra work.
3. Lack of information management coordination between the ministries, and between the government and Parliament.
4. In spite of the fact that almost all of the documents were digital, documents were mostly distributed on paper.
5. The retrieval techniques of different systems were heterogeneous.
6. The retrieval techniques of the electronic archiving system and the tracking system of Parliament were not satisfactory.
7. Uncertainty concerning the future usability of the information in the archived digital documents.
The document analysis in the RASKE project concerned four domains: the enquiry process, national legislative work, Finnish participation in EU legislative work, and the creation of the state budget. During the case analyses, various methods of analysis were tested and developed. Preliminary DTDs were designed for 21 document types including, for example, Government Bill, Government Decision, Government Communication, Private Bill, Special Committee Report, Budget Proposal, and Communication of Parliament.
翻译部分:
在工作环境中分析文档的流转
关键词:document,government
概要:
文档标准化的目的是为了提高工作效率和一致性,通常的方法是利用信息技术。文档标准的规范和执行需要几年的对实际文档管理的长远分析和理解。文档标准化不仅仅只是涉及文档本身,还涉及到职员,职员的工作,商业合作伙伴以及将来的制度。在这里我们将论述两种描述文档工作联系的方法:进程模型和生命周期模型。在进程模型中,文档被认作是生产的资源,并被用于交互组织或者内部组织的商业进程。在生命周期模型中,工作涉及到处理一个被描述的具体类型的文档。在对四个案件领域分析的期间,这种模型的方法已经通过了一个称为RASKE的SGML标准化工程测试:向芬兰议会和政府的询问过程,芬兰国家的立法工作,财政预算,还有芬兰在EU立法机关所参与的工作。本文将论述在文档分析中的模型要求和描述RASKE项目中所用到的技术。
作者关键字:文档分析,文档标准化,进程模型,SGML,XML
1. 导言
组织机构中所存储的电子文档的数据量增长迅速,但是由于文档格式和系统的多样性,以及信息技术的持续改革,导致了工作任务中出现了存取和利用这些必要信息的问题。这些问题涉及到了公司和公共部门机构两个部分。这些问题促使机构开始在着手一些大型的文档标准化工程,工程的目的就是为了使文档描述的行业信息的规则达成一致。而这些规则就是为了提高效率和一致性,以及在商业进程中利用信息技术的稳定途径。科技技术进步以及对数字文档的长期使用的维护的问题 ,推动了文档的搜索应用程序的独立。SGML(标准通用标注语言)是一个在独立请求格式(Goldfarb,1990)中定义和描述文档的国际标准。SGML中叫做XML 的子集已经得到发展,尤其是作为指定文档标准应用在网络信息系统。(Bray,Paoli& Sperberg-McQueen,1998)。
在SGML/XML标准化工程中,一个深思熟虑的文档分析是十分必要的。这个分析往往被看作是文档框架的分解(Travis; Watson and Maler, Magnusson Sjöberg, 1997, Weitz, 1998)。然而企业文档标准的成功执行,需要工作进程的文档角色理解。尤其是当标准化涉及到一些文档类型和文档的成果作为交互组织商业进程的一部分时,进程中的分析者和参与者应当可以看见文档进程的联系。本文我们将工作进程模型作为文档分析的一部分来进行讨论。我们将介绍应用在称为RASKE的大型标准化工程的模型技术,在那里,标准化已经被应用到到芬兰国会和政府部门的文档的创建。( Salminen; Salminen and Salminen)。
2. 电子文档管理环境
机构把文档作为信息管理的一种手段:收集,管理,存储,传递的手段,并用这些信息完成他们的组织目的。电子文档管理的术语(EDM)引用了利用现代信息技术目的的这一层含义。在文档标准化中,识别是很重要的环节,不仅仅是文档以及文档的框架,还有其他电子文档管理环境的实体,在这些环境中文档被创建,操作以及使用。图一显示了一个使用了信息控制网络中心概念(ICNs)的EDM环境的模型:活动性和资源(Ellis,1979)。在活动性中信息被创造和使用。资源就是信息的仓库,它能把创造的信息存储起来或者提取出去。图中的线表明了信息流流出或者流入资源。活动性的设置用圆圈表示,资源设置用矩形表示。资源分为三种类型:文档,规则和参与者。文档由可以被人们所理解的记录数据项组成。在活动性中,文档作为一个可被识别和掌握的个体,被理解成与主题相符合的信息。由于在EDM环境中,文档主要是数字,这就意味着,信息技术是必须的,并且要利用它来管理文档。因此,规则,也就是硬件,软件,和应用,是EDM环境中基本的资源。另一方面,在规则改变前后,文档中的信息都应该是可用的,把文档从规则和资源中分离也是很重要的。最后,作为参与者的人和机构控制文档流转的进程,在整个文档的流转中运用文档和流转规则。在一些完全自动流转的过程中,软件系统会自动控制流转过程(例如,创建一个电子邮件消息并发送到储存室)。本文我们将考虑人或者机构作为参与者创建和使用文档的流转过程。在文档和规则的关系中,参与者被称为用户。参与者由角色构成。角色制定流转过程中的任务,职责和参与者的权利,或者在文档流转中作为其中一个部分---用户。
(3K)
图一.电子文档管理环境的组成
流转过程中信息块的需要和创建用不同的方法进行储存:人们的头脑和经验,组织文化,硬件和软件的解决方案,文档的数据和应用。如果依照Dervin(1992)提出的“由读者(Dervin指的是人类读者)在特定的瞬间,时间和空间,依据情况产生的感觉”制造感官的理论能理解信息的概念,这样的信息就是主观的,也被人们执行一个复杂的,由不同资源所结合的流转过程所需要。
EDM环境可以应用在单独的机构。然而在当前的网络世界,商业进程往往涉及到几个机构和或多或少被其他机构所共享的资源。因而涉及特殊机构或者个人的EDM环境也许会更复杂。
3. 文档标准化
一个改进商业进程的方法就是在独立请求标准书型中实施文档标准化。标准化的思想是在设计数字信息框架和格式的时候考虑系统将来的改革,而不是把它们设计成一个特殊的软件系统。流转的规则包括文档,文档的创建者以及文档的储存格式,目的是当系统改进时,有助于创建者和其他不同的读文档的人能理解文档内容的一致性。Sprague(1995)对于机构内部的电子文档管理策略的发展提出了一些建议。而标准化就是策略的主要内容。
3.1 RASKE作为一个标准化工程
RASKE就是标准化工程的一个例子。术语RASKE来源于芬兰的一个单词‘Rakenteisten AsiakirjaStandardien KEhittäminen’,意思是文档框架标准的发展。这项工程开始着手于1994年的春天,由芬兰国会和一个软件公司连同Jyväskylä综合大学的一些研究员合作负责。外交部,芬兰的政府部门以及政府总理和出版社都参与了这项工程。
芬兰国会和政府部门的难题是RASKE工程的起因。立法机关的研究人员遇到了与国会同样的难题,例如,以下的问题就是有关文档管理的(Salminen et al.,1997):
1. 使用不协调的系统就会导致当需要对相同片断的文本进行重复键入时,就会与文档的电源开关相冲突。
2. 文档的命名和文档的标志符之间的矛盾导致了问题和增加了额外的工作。
3. 在政府部门之间,内阁和国会之间缺乏信息管理的协和。
4. 由于文档在书面形式上的分布式而忽略了大部分的文档是数字式这一事实。
5. 不同的系统有不同的检索技术。
6. 电子档案系统的检索技术和国会的跟踪系统存在着安全隐患。
7. 档案文件中数字文档的信息将来可用性存在着不确定性。
RASKE工程的文档分析包含四个领域:进程询问,国家立法工作,芬兰参与的EU立法工作,国家财政预算的创造。在案例分析中,测试使用了不同的分析方法。预先设计的文件类型定义包括21个文档类型,例如,政府议案,政府决议,政府的通讯,司法法案,特别委员会报告,概算以及国会的通讯。
前面翻译的狗屁不通的,后面才渐渐找到一点感觉。
小女子新来的,请大家多多指教。