eTextReader.OEB
Class HTMLTitleHandler

java.lang.Object
  extended by org.xml.sax.helpers.DefaultHandler
      extended by eTextReader.OEB.HTMLTitleHandler
All Implemented Interfaces:
TitleHandler, java.io.Serializable, org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler

public class HTMLTitleHandler
extends org.xml.sax.helpers.DefaultHandler
implements TitleHandler, java.io.Serializable

This class is used to determine the title from a document whose media type is text/html. It receives events from an XML parser that implements the SAX interface

See Also:
Serialized Form

Field Summary
private  java.lang.StringBuffer currentHeading
          the content of the current parsed heading
private  java.lang.String currentHeadingName
          the name of the currently parsed heading element, or null if not in a heading
(package private)  java.util.Map headingCount
          contains count of each heading element found in the current div element
private  java.util.Stack headingCountStack
          a stack containing heading element counts for the previously encountered div elements
(package private)  java.util.Set headingElements
          contains the elements we are interested in looking for
private  java.util.List headingList
          a list of Heading objects corresponding to the headings in the document
private  java.util.Stack idStack
          a stack of id tags found in div items to keep track of the location of the headers
private  boolean inTitle
          are we currently parsing a title element?
private  boolean parsed
          have we parsed a document yet?
private  java.lang.StringBuffer title
          holds the contents of the table of contents as it is being built
private static java.lang.Integer zero
          a constant used for initialization of heading element count entries
 
Constructor Summary
HTMLTitleHandler()
          construct a new object that can parse HTML documents for their title elements
 
Method Summary
 void characters(char[] ch, int start, int length)
          if currently in the title element, appends the data in the array ch staring at start and running for length characters to the title
 void endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName)
          turns off recording of character content
 java.util.List getHeadings()
          retrieve a list of headings for the most recently parsed object
 java.lang.String getTitle()
          get the title of the most recently parsed document
 void parse(java.lang.String url)
          load and parse the URL url.
protected  java.util.Map resetHeadingCounts()
          creates a map with counts of zero for each of the elements in the headingElements set
 void startDocument()
          called when a parsing of a new document is commenced; this implementation resets the parser state
 void startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes attr)
          checks whether it is a title element and records if it is
 void tryLoadingTitleFromText(java.lang.String url)
           
 
Methods inherited from class org.xml.sax.helpers.DefaultHandler
endDocument, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startPrefixMapping, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

parsed

private boolean parsed
have we parsed a document yet?


inTitle

private boolean inTitle
are we currently parsing a title element?


currentHeadingName

private java.lang.String currentHeadingName
the name of the currently parsed heading element, or null if not in a heading


currentHeading

private java.lang.StringBuffer currentHeading
the content of the current parsed heading


headingElements

java.util.Set headingElements
contains the elements we are interested in looking for


headingCount

java.util.Map headingCount
contains count of each heading element found in the current div element


idStack

private java.util.Stack idStack
a stack of id tags found in div items to keep track of the location of the headers


headingCountStack

private java.util.Stack headingCountStack
a stack containing heading element counts for the previously encountered div elements


title

private java.lang.StringBuffer title
holds the contents of the table of contents as it is being built


headingList

private java.util.List headingList
a list of Heading objects corresponding to the headings in the document


zero

private static java.lang.Integer zero
a constant used for initialization of heading element count entries

Constructor Detail

HTMLTitleHandler

public HTMLTitleHandler()
construct a new object that can parse HTML documents for their title elements

Method Detail

startDocument

public void startDocument()
called when a parsing of a new document is commenced; this implementation resets the parser state

Specified by:
startDocument in interface org.xml.sax.ContentHandler
Overrides:
startDocument in class org.xml.sax.helpers.DefaultHandler

resetHeadingCounts

protected java.util.Map resetHeadingCounts()
creates a map with counts of zero for each of the elements in the headingElements set

Returns:
a new map containing an entry corresponding to each heading element in the headingElements set. The value in each entry is the static Integer zero

startElement

public void startElement(java.lang.String uri,
                         java.lang.String localName,
                         java.lang.String qName,
                         org.xml.sax.Attributes attr)
checks whether it is a title element and records if it is

Specified by:
startElement in interface org.xml.sax.ContentHandler
Overrides:
startElement in class org.xml.sax.helpers.DefaultHandler

endElement

public void endElement(java.lang.String uri,
                       java.lang.String localName,
                       java.lang.String qName)
turns off recording of character content

Specified by:
endElement in interface org.xml.sax.ContentHandler
Overrides:
endElement in class org.xml.sax.helpers.DefaultHandler

characters

public void characters(char[] ch,
                       int start,
                       int length)
if currently in the title element, appends the data in the array ch staring at start and running for length characters to the title

Specified by:
characters in interface org.xml.sax.ContentHandler
Overrides:
characters in class org.xml.sax.helpers.DefaultHandler

parse

public void parse(java.lang.String url)
           throws java.lang.Exception
Description copied from interface: TitleHandler
load and parse the URL url. This method must be called before any of the accessor methods

Specified by:
parse in interface TitleHandler
Parameters:
url - the URL of the document to be parsed
Throws:
java.lang.Exception - if an error occurs parsing the document

tryLoadingTitleFromText

public void tryLoadingTitleFromText(java.lang.String url)

getTitle

public java.lang.String getTitle()
                          throws java.lang.IllegalStateException
Description copied from interface: TitleHandler
get the title of the most recently parsed document

Specified by:
getTitle in interface TitleHandler
Returns:
a string representing the title; if one cannot be determined the URL of the document is returned
Throws:
java.lang.IllegalStateException - if no document has been parsed by this handler

getHeadings

public java.util.List getHeadings()
                           throws java.lang.IllegalStateException
Description copied from interface: TitleHandler
retrieve a list of headings for the most recently parsed object

Specified by:
getHeadings in interface TitleHandler
Returns:
a List of Heading objects for each of the headings in the object
Throws:
java.lang.IllegalStateException - if no document has been parsed by this handler