The eTextReader Project

Overview of system architecture

The eTextReader project is made up of several components:

the textbook reading/browsing software
the database storing information about users and annotations
content making up the textbooks, including an application to decrypt content if necessary

Each of these components are described in this document.

Textbook reading/browsing software

Almost all of the eTextReader software is written in Java, with the primary exception being the code that allows users to create/edit ink annotations, which is a C# .NET application. The software is broken down into several packages, which are unfortunately not as well defined as they could be:

eTextReader: This contains many of the primary classes making up the visual interface of the eTextReader application. Probably the two most interesting classes in this package are the Browser and ContentPane classes. The Browser class implements the main window of the eTextReader application, while ContentPane is the main class that displays content and annotations. The main method of the application is located within the Browser class.
eTextReader.actions: Contains the actions that are invoked when menus, buttons, etc. are invoked from with the user interface. These actions are typically modeled as singletons, and the classes BaseAction and BrowserAction encapsulate the functionality needed to enforce the singleton nature of these classes. Many actions depend on the currently selected annotation, and the class CurrentAnnotationAction provides a base functionality for such actions.
eTextReader.clientInterface: Defined in this package are an interface (NotationClient) that specifies the methods which any datastore holding the annotations can be stored and accessed. Currently, a single concrete implementation (DBClient) is also provided in this package. Unfortunately, over time the distinction between the interface and the implementing class has been blurred, so that the DBClient class is closely tied into the system architecture.
eTextReader.OEB: This package contains classes that allow the creation and manipulation of Open eBook publication structure definitions, which allow for the description of the files making up a specific textbook. Textbooks are essentially a collection of URLs, which allows for a great deal of flexibility in putting content together.
eTextReader.AnnotationListing: In this package are classes that implement the functionality of listing a set of annotations. The NotationListing class is the primary user interface for this functionality. NotationListing makes heavy use of two abstractions, the NotationProvider and NotationFilter interfaces, which supply the NotationListing window with a set of candidate pages, and rules for restricting which annotations are selected from those candidate pages, respectively.
eTextReader.tabletpc: contains a definition of the interface used to communicate with the C# .NET ink annotation editor (InkCallback), as well as several helper methods that facilitate working with the tablet PC environment.
eTextReader.search: contains the classes used to facilitate searching of the content stored in a given textbook. Included is the GUI front end to the searching process (FindPanel), as well as the classes that create and search the book's index.
eTextReader.authentication: Specifies an interface used by the eTextReader to existing authentication systems, allowing user authentication to be performed in a variety of ways. Currently, the only authentication mechanism in place is an internal system in which user names and passwords are stored within the annotation data store.
eTextReader.componentFactory: Rather than creating Swing components directly using the new keyword, the eTextReader system makes use of a component factory paradigm to produce components. This allows the possibility of global customization of the components making up the user interface without changes to the code creating these components. Currently, this capability is mostly intended to ease coding of test cases, and to ensure that invocation of all actions via the GUI components of the application are logged.

Overviews of the individual packages should be contained in the package overview file in the API documentation.

Database of users and annotations

As previously mentioned, an original design goal was to allow any sort of backing store to be used to hold the annotation database. Over time, this design has not been carefully followed, and much of the interface specification has taken place inside of the DBClient class. In particular, calls to obtain a reference to the annotation database hard code the creation of a DBClient instance, rather than using a factory implementation to obtain access to the data store. All further discussion of the implementation will therefore be focused on this particular implementation of the data store.

The data store is implemented using a Microsoft SQL Server database. In order to facilitate use of the application in both on- and off-line environments, a version of the database server is installed locally when the application is installed, and this local version is periodically synchronized with the central database. Previously, the "normal" method of operation was assumed to be on-line, with off-line status considered to be exceptional. For performance reasons, this assumption is likely to be reversed, so that the local data store is always the current source of annotation information, with periodic updates still made to the central data store.

The most important table in the database is the notations table. The current schema of this table is as follows:

Columns in the notations table
Column Name	Data type	Description
id	int	A unique identifier for the annotation. See note on generation of these ids below
url	varchar(256)	The URL of the content this annotation refers to
addressStart	varchar(256)	The starting address annotation's text anchor. See note on addresses below
addressEnd	varchar(256)	The ending address of the annotation's text anchor.
type	varchar(32)	The type of the annotation, such as text note, bookmark, etc.
author	varchar(64)	Who created the annotation? This should really contain the user id rather than name
target	varchar(32)	A singleton field describing the target of the annotation, whose purpose has been subsumed by the NotationModes table
viewableBy	varchar(64)	A singleton field describing who can view the annotation, whose purpose has been subsumed by the NotationModes table
subject	varchar(64)	A user-generated subject for the annotation, generally used as a human-readable identifier. Some exceptions to this exist.
discussionID	int	If this annotation is part of a discussion, a reference to the discussion ID which correlates postings to a discussion together. If the annotation is not part of a discussion, this field is null.
regarding	int	Another annotation in this table that this annotation refers to. Used only by the discussion mechanism currently.
created	datetime	The date and time on which this annotation was created. Generally system supplied.
modified	datetime	The date and time on which this annotation was last modified. Generally system supplied.
body	text	The content of the annotation; usually user entered text, although exceptions exist
rowguid	uniqueidentifier	A unique system generated identifier used for synchronization purposes
Diagram	image	If the annotation is a diagram (or an ink annotation), this field contains the data associated with the diagram.
viewmode	varchar(32)	Specifies how the annotation should be displayed.
isReference	varchar(8)	If set to true, then the body of the annotation actually contains a URL; the content stored at this URL is then the content of the annotation.

More details on the tables in the database are available here.

Generation of ID values

Addressing Scheme

A scheme must be created to facilitate identifying portions of text referenced by annotations. The current scheme employed by the eTextReader system is similar to that used by the XPath systems. Document content is modeled as a tree of nodes, and selection of a portion of the content is done by outlining the path through this tree needed to reach the content in question.

Since the content being displayed by the eTextReader is often outside of the user's control, it is possible for an address to be correct at the time of annotation creation, but to have subsequent changes to the content break this address. The eTextReader attempts to minimize this probability by starting all paths at an element containing an XML ID attribute. If this is done, then changes outside the range of content specified by the ID should not break any addresses that start at the given ID value. If an ID attribute cannot be found along the path from the selection to the root node of the tree, a path is generated starting at the top of the tree.

Currently, no effort is made to identify links that may have been broken. One possible way of doing this is to store a checksum of the selected content, and verifiying that this checksum is correct when the annotation is loaded. Note that this only allows us to know when a link has been broken; it will likely not be all that useful when attempting to locate the correct location of the link.

This work initially funded by a National Science Division Grant (DUE-0126486-00) and the HP Teaching with Technology Initiative