Package eu.simuline.util.sgml
Class SGMLParser
- java.lang.Object
-
- eu.simuline.util.sgml.SGMLParser
-
public final class SGMLParser extends Object
A rudimentarySGMLparser with something like a SAX-api.- Version:
- 1.0
- Author:
- Ernst Reissner
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) classSGMLParser.AttributesWrapperAn **** partial **** implementation of the SAX-interfaceAttributeswhich allows to set name-value-pairs by methodSGMLParser.AttributesWrapper.addAttribute(java.lang.String, java.lang.String).(package private) static classSGMLParser.BufferClass which buffers the read stream.(package private) static interfaceSGMLParser.CharTesterProvides a single method which decides whether the given character passes a certain test.(package private) static classSGMLParser.SpecCharTesterACharTesterwhich allows to specify the character which passes the test.(package private) static classSGMLParser.TrivialContentHandlerAContentHandlerwhich simply ignores all events.(package private) static interfaceSGMLParser.XMLsGMLspecificaProvides a bunch of methods fpr parsing with implementations specific to xml and sgml.
-
Field Summary
Fields Modifier and Type Field Description private static StringATTR_NAMEShort string representation of the object currently parsed.private static StringATTR_VALUEShort string representation of the object currently parsed.private SGMLParser.BufferbufferThe buffer of the input stream.private static intBUFFER_SIZEThe size of the buffer used internally.private ContentHandlercontentHandlerTheContentHandlerregistered.private intcurrCharThe current character or-1to signfy the end of the stream.private static StringEND_TAGShort string representation of the object currently parsed.private SGMLParser.XMLsGMLspecificahtmlAttributeParserContains theHTML-specific part of the parser.private ParseExceptionHandlerparseExceptionHandlerTheParseExceptionHandlerregistered.private static StringPROC_INSTRShort string representation of the object currently parsed.private static StringQUOTE_DOTprivate static StringSTART_TAGShort string representation of the object currently parsed.private static charSYMB_COMMENTprivate static charSYMB_EQprivate static charSYMB_TAGprivate static SGMLParser.CharTesterTEST_BLANK_EQUALS_GTTests for=and for>.private static SGMLParser.CharTesterTEST_BLANK_GTTests for blank or>.private static SGMLParser.CharTesterTEST_BLANK_GT_SLASHTests for blank,/,>.private static SGMLParser.CharTesterTEST_END_OF_COMMENTTests for quote both for'and for".private static SGMLParser.CharTesterTEST_GTTests for>.private static SGMLParser.CharTesterTEST_LTTests for<.private static SGMLParser.CharTesterTEST_NO_WHITESPACETests for whitespace.private static SGMLParser.SpecCharTesterTEST_SPECTests for a specified character.private static StringWHITESP_IN_ATTRShort string representation of the object currently parsed.private SGMLParser.XMLsGMLspecificaxmlAttributeParserContains theXML-specific part of the parser.private SGMLParser.XMLsGMLspecificaxmlSgmlSpecificaContains class with methods specific for xml and sgml, respectively.
-
Constructor Summary
Constructors Constructor Description SGMLParser()Creates a newSGMLParserwith the default handlers for content and exceptions.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description ContentHandlergetContentHandler()ReturnscontentHandler.ParseExceptionHandlergetExceptionHandler()ReturnsparseExceptionHandler.booleanisXMLParser()voidparse(Reader reader)Parses the givenInputStream.(package private) voidparse(InputSource src)Parses theInputSourcegiven but delegates everything inside a tag or a processing instruction toparseTagOrPI().(package private) voidparseEndTag()Parses an end-tag notifying the underlying handler.(package private) voidparseStartOrStartEndTag()Parses a start-tag or, for xml, an empty tag.private intparseTagOrPI()Parses everything within a tag, a processing instruction, ...private intparseText()Parses everything outside a tag, a processing instruction, ...booleanparseXML(boolean xml)Sets whether this parser is used as an xml-parser.voidsetContentHandler(ContentHandler contentHandler)SetscontentHandler.voidsetExceptionHandler(ParseExceptionHandler peHandler)SetsparseExceptionHandler.
-
-
-
Field Detail
-
QUOTE_DOT
private static final String QUOTE_DOT
- See Also:
- Constant Field Values
-
SYMB_EQ
private static final char SYMB_EQ
- See Also:
- Constant Field Values
-
SYMB_COMMENT
private static final char SYMB_COMMENT
- See Also:
- Constant Field Values
-
SYMB_TAG
private static final char SYMB_TAG
- See Also:
- Constant Field Values
-
TEST_BLANK_GT_SLASH
private static final SGMLParser.CharTester TEST_BLANK_GT_SLASH
Tests for blank,/,>.
-
TEST_BLANK_GT
private static final SGMLParser.CharTester TEST_BLANK_GT
Tests for blank or>.
-
TEST_LT
private static final SGMLParser.CharTester TEST_LT
Tests for<.
-
TEST_GT
private static final SGMLParser.CharTester TEST_GT
Tests for>.
-
TEST_BLANK_EQUALS_GT
private static final SGMLParser.CharTester TEST_BLANK_EQUALS_GT
Tests for=and for>.
-
TEST_NO_WHITESPACE
private static final SGMLParser.CharTester TEST_NO_WHITESPACE
Tests for whitespace.
-
TEST_END_OF_COMMENT
private static final SGMLParser.CharTester TEST_END_OF_COMMENT
Tests for quote both for'and for".
-
TEST_SPEC
private static final SGMLParser.SpecCharTester TEST_SPEC
Tests for a specified character. This is used for quotes which allow the cases'and".
-
htmlAttributeParser
private final SGMLParser.XMLsGMLspecifica htmlAttributeParser
Contains theHTML-specific part of the parser.
-
xmlAttributeParser
private final SGMLParser.XMLsGMLspecifica xmlAttributeParser
Contains theXML-specific part of the parser.
-
BUFFER_SIZE
private static final int BUFFER_SIZE
The size of the buffer used internally. This must be at least1. I found no significant difference in speed when increasing this number. The buffer coming from a stream from a URL seems to hav maximal size of1448whereas for file streams there seems no bound. In the cases considered, the file is read in as a whole.- See Also:
- Constant Field Values
-
START_TAG
private static final String START_TAG
Short string representation of the object currently parsed. Contains the specific part of the message of the exception that may be thrown bySGMLParser.Buffer.readStringBuffer(eu.simuline.util.sgml.SGMLParser.CharTester, java.lang.String).- See Also:
- Constant Field Values
-
END_TAG
private static final String END_TAG
Short string representation of the object currently parsed. Contains the specific part of the message of the exception that may be thrown bySGMLParser.Buffer.readStringBuffer(eu.simuline.util.sgml.SGMLParser.CharTester, java.lang.String).- See Also:
- Constant Field Values
-
PROC_INSTR
private static final String PROC_INSTR
Short string representation of the object currently parsed. Contains the specific part of the message of the exception that may be thrown bySGMLParser.Buffer.readStringBuffer(eu.simuline.util.sgml.SGMLParser.CharTester, java.lang.String).- See Also:
- Constant Field Values
-
ATTR_NAME
private static final String ATTR_NAME
Short string representation of the object currently parsed. Contains the specific part of the message of the exception that may be thrown bySGMLParser.Buffer.readStringBuffer(eu.simuline.util.sgml.SGMLParser.CharTester, java.lang.String).- See Also:
- Constant Field Values
-
WHITESP_IN_ATTR
private static final String WHITESP_IN_ATTR
Short string representation of the object currently parsed. Contains the specific part of the message of the exception that may be thrown bySGMLParser.Buffer.readStringBuffer(eu.simuline.util.sgml.SGMLParser.CharTester, java.lang.String).- See Also:
- Constant Field Values
-
ATTR_VALUE
private static final String ATTR_VALUE
Short string representation of the object currently parsed. Contains the specific part of the message of the exception that may be thrown bySGMLParser.Buffer.readStringBuffer(eu.simuline.util.sgml.SGMLParser.CharTester, java.lang.String).- See Also:
- Constant Field Values
-
xmlSgmlSpecifica
private SGMLParser.XMLsGMLspecifica xmlSgmlSpecifica
Contains class with methods specific for xml and sgml, respectively.
-
currChar
private int currChar
The current character or-1to signfy the end of the stream.
-
contentHandler
private ContentHandler contentHandler
TheContentHandlerregistered.
-
parseExceptionHandler
private ParseExceptionHandler parseExceptionHandler
TheParseExceptionHandlerregistered.
-
buffer
private SGMLParser.Buffer buffer
The buffer of the input stream.
-
-
Method Detail
-
parse
void parse(InputSource src) throws IOException, SAXException
Parses theInputSourcegiven but delegates everything inside a tag or a processing instruction toparseTagOrPI().- Parameters:
src- anInputSource.- Throws:
IOException- if an error occursSAXException- if an error occurs
-
parse
public void parse(Reader reader) throws IOException, SAXException
Parses the givenInputStream.- Parameters:
reader- anReadersequentializing an SGML document.- Throws:
IOException- if an error reading the stream occurs.SAXException- if an error with the sgml-syntax occurs.
-
parseText
private int parseText() throws IOException, SAXExceptionParses everything outside a tag, a processing instruction, ... everything within brackets<and>. ***** Missing: distinction between notification of characters and whitespace. ****- Throws:
IOException- if an error reading the stream occurs.SAXException- if an error with the sgml-syntax occurs.- See Also:
parseTagOrPI()
-
parseEndTag
void parseEndTag() throws IOException, SAXExceptionParses an end-tag notifying the underlying handler.- Throws:
IOException- if an error reading the stream occurs.SAXException- if an error with the sgml-syntax occurs.
-
parseStartOrStartEndTag
void parseStartOrStartEndTag() throws IOException, SAXExceptionParses a start-tag or, for xml, an empty tag.- Throws:
IOException- if an error reading the stream occurs.SAXException- if an error with the sgml-syntax occurs.
-
parseTagOrPI
private int parseTagOrPI() throws IOException, SAXExceptionParses everything within a tag, a processing instruction, ... everything within brackets<and>.- Throws:
IOExceptionSAXException- See Also:
parseText()
-
setContentHandler
public void setContentHandler(ContentHandler contentHandler)
SetscontentHandler.- Parameters:
contentHandler- aContentHandler.
-
getContentHandler
public ContentHandler getContentHandler()
ReturnscontentHandler.- Returns:
- the
ContentHandlercontentHandler.
-
setExceptionHandler
public void setExceptionHandler(ParseExceptionHandler peHandler)
SetsparseExceptionHandler.- Parameters:
peHandler- aParseExceptionHandler.
-
getExceptionHandler
public ParseExceptionHandler getExceptionHandler()
ReturnsparseExceptionHandler.- Returns:
- the
ContentHandlerparseExceptionHandler.
-
parseXML
public boolean parseXML(boolean xml)
Sets whether this parser is used as an xml-parser. If this is false, which is the default, it s an html-parser.- Parameters:
xml- abooleanvalue signifying whether this parser will be used as an xml-parser in the sequel.- Returns:
- a
booleanvalue signifying whether before invoking this method this parser was used as an xml-parser
-
isXMLParser
public boolean isXMLParser()
-
-