DocumentParser (SAFS API DOCUMENT)

java.lang.Object
- org.safs.selenium.DocumentParser

```
public class DocumentParser
extends java.lang.Object
```
Purpose: Provide the ability to make conversion between XPATH and SAFS Recognition String
Note: To achieve this purpose, Document contained in an HTML page is parsed.
We take advantage of API provided by DOM4J.
1. Html parser: Parse HTML page and convert it to W3C-XML.
NekoParser(opensource) http://sourceforge.net/projects/nekohtml/
version 1.9.14 nekohtml.jar
2. XML-DOM parser: Parse W3C-XML.
DOM4J(opensource) http://www.dom4j.org/dom4j-1.6.1/
dom4j-2.0.0-ALPHA-2.jar
NOTE: so far found another jar needed if using XPATH search
jaxen-1.1.1.jar (an open source XPath library) at http://jaxen.org/releases.html

Author:

Junwu Ma Feb 18, 2011 Initial creation., Lei Wang Mar 18, 2011 Implement methods to manipulate dom4j's document, node, element, attribute etc.
Implement methods to operate on xpath, which are used by SELENIUM-SPC. These methods
have been implemented in user-extensions.js, here I just make a conversion.
, Lei Wang Jun 07, 2011 Modify methods generateAttributRS(), getAttribute() and getIndex().
, Lei Wang Jun 28, 2011 Add method getAttributes().
, Lei Wang Aug 10, 2012 Rename getRobotRecognition() to getRobotRecognitionNode(): return a SPCTreeNode containing "RS", "id", and "name" for a html element.

Field Summary

Fields
Modifier and Type	Field and Description
`static java.lang.String`	`ASSIGN_SEPARATOR`
`static java.lang.String`	`BOUDNS_SEPARATOR`
`java.util.HashMap<java.lang.String,org.dom4j.Document>`	`documents` documents is a map-cache to contain dom4j-document as value and URL as key.
`static java.lang.String`	`RECOGNITION_LEVEL_SEPARATOR`
`(package private) com.thoughtworks.selenium.Selenium`	`selenium`
`int`	`timeconsume1`
`int`	`timeconsume2`
`int`	`timeconsume3`
`java.lang.String`	`url`
`(package private) SeleniumGUIUtilities`	`util`
`static java.lang.String`	`XPATH_ALL_ELEMENTS`
`static java.lang.String`	`XPATH_ALL_LEVEL_PREFIX`
`static java.lang.String`	`XPATH_ATTRIBUTES`

Constructor Summary

Constructors
Constructor and Description

DocumentParser(com.thoughtworks.selenium.Selenium selenium, SeleniumGUIUtilities util)

DocumentParser(java.lang.String url)

Constructors
Constructor and Description
`DocumentParser(com.thoughtworks.selenium.Selenium selenium, SeleniumGUIUtilities util)`
`DocumentParser(java.lang.String url)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`boolean`	`checkAttributes(org.dom4j.Node node, java.lang.String[][] attrcheck, boolean matchPartial)`
`java.lang.String[]`	`getAllElements()`
`void`	`getAllElementsR(org.dom4j.Document doc, java.lang.String prefix, int top, int left, java.lang.String docURL, java.util.List<java.lang.String> elementsXpathList)` Note: This is a recursive method, for each frame in this document, this method will be called.
`java.util.List<java.lang.String>`	`getAllXPath(org.dom4j.Document doc, java.lang.String[] tags)`
`java.lang.String`	`getAttribute(org.dom4j.Document doc, java.lang.String xpath, java.lang.String attribute)` Purpose: Get the value of an attribute
`java.lang.String`	`getAttribute(org.dom4j.Node node, java.lang.String attribute)` Purpose: Get the value of an attribute
`java.lang.String`	`getAttribute(java.lang.String url, java.lang.String xpath, java.lang.String attribute)` Purpose: Get the value of an attribute
`java.util.HashMap`	`getAttributes(org.dom4j.Document doc, java.lang.String xpath)` Purpose: Get all properties of an element on html page.
`java.util.HashMap`	`getAttributes(org.dom4j.Node node)` Purpose: Get all properties of an element on html page.
`java.util.HashMap`	`getAttributes(java.lang.String url, java.lang.String xpath)` Purpose: Get all properties of an element on html page.
`java.lang.String`	`getBoundsSeparator()`
`java.lang.String`	`getBrowserClientScreenPosition(com.thoughtworks.selenium.Selenium selenium)`
`java.lang.String`	`getClientScrollInfo(com.thoughtworks.selenium.Selenium selenium)`
`org.dom4j.Document`	`getDocument(java.lang.String url, boolean changeMainPageURL)`
`org.dom4j.Element`	`getElementFromXpath(org.dom4j.Document document, java.lang.String xpath)`
`java.util.List`	`getElementsMatchingTags(org.dom4j.Document document, java.lang.String[] tags)`
`java.lang.String`	`getEncodingName(java.io.InputStream ins)` Note: To detect the encoding of the input-stream
`int`	`getFrameIndex(org.dom4j.Document document, java.lang.String xpath)`
`int`	`getFrameLeft(org.dom4j.Node frameNode)`
`java.lang.String`	`getFrameSrcURL(org.dom4j.Document document, java.lang.String xpath, java.lang.String parentURL)`
`java.lang.String`	`getFrameSrcURL(org.dom4j.Node frameNode, java.lang.String parentURL)`
`int`	`getFrameTop(org.dom4j.Node frameNode)`
`int`	`getIndex(org.dom4j.Document document, org.dom4j.Element element)`
`java.lang.String`	`getInnerHtml(java.lang.String content, java.lang.String tag)` Purpose: get the innerHTML from page content Note: The tag must be the beginning and ending tag of the content
`java.lang.String`	`getNodeXPath(org.dom4j.Node node, java.lang.String prefix)`
`java.lang.String`	`getRobotRecog(org.dom4j.Document document, java.lang.String xpath)` Generate the RS in SAFS-Robot's format for element described by xpath.
`java.lang.String`	`getRobotRecognition(java.lang.String xpath, boolean withName)` According to a xpath, return "SAFS recognition string"
`SPCTreeNode`	`getRobotRecognitionNode(java.lang.String xpath, boolean withName)` According to a xpath, return SPCTreeNode containig "SAFS recognition string", Html element's id, Html element's name
`java.lang.String`	`getSSBounds()`
`java.lang.String`	`getUniqueNameForXpath(java.lang.String url, java.lang.String xpath, java.lang.String prefix)`
`java.lang.String`	`getUrl()`
`java.lang.String`	`getXpath(org.dom4j.Document document, java.lang.String[] tags, java.lang.String[][] attributes, int index, boolean secondaryMatch, boolean matchPartial)` This functions takes an array of (HTML) tags, a double array of attributes to check, and whether to check the text attributes for partial matches.
`void`	`goBackToMainPage()`
`boolean`	`isMainPage()` When we call getAllElements(), if the html contains frame, we will try to get html content with aid of API selenium.open(frameURL); selenium.getHtmlSource().
`static void`	`main(java.lang.String[] args)`
`java.lang.String`	`normalizeURL(java.lang.String url)`
`org.dom4j.Document`	`setDocument(java.lang.String urlString, java.lang.String htmlSource, boolean changeMainPageURL)` Purpose Get dom4j-DOM according to html-content If the html-content is null, then try to get it according to the urlString. The method `setHTTPProxy()` may needs to be called before calling this method.
`void`	`setHTTPProxy()` This is needed when we read the content from an URL
`void`	`setInterrupt(boolean interrupt)`
`void`	`test()`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - documents
```
public java.util.HashMap<java.lang.String,org.dom4j.Document> documents
```
    documents is a map-cache to contain dom4j-document as value and URL as key.
  - url
```
public java.lang.String url
```
  - selenium
```
com.thoughtworks.selenium.Selenium selenium
```
  - util
```
SeleniumGUIUtilities util
```
  - XPATH_ALL_LEVEL_PREFIX
```
public static final java.lang.String XPATH_ALL_LEVEL_PREFIX
```
    See Also:
    
    Constant Field Values
  - XPATH_ALL_ELEMENTS
```
public static final java.lang.String XPATH_ALL_ELEMENTS
```
    See Also:
    
    Constant Field Values
  - XPATH_ATTRIBUTES
```
public static final java.lang.String XPATH_ATTRIBUTES
```
    See Also:
    
    Constant Field Values
  - BOUDNS_SEPARATOR
```
public static final java.lang.String BOUDNS_SEPARATOR
```
    See Also:
    
    Constant Field Values
  - ASSIGN_SEPARATOR
```
public static final java.lang.String ASSIGN_SEPARATOR
```
    See Also:
    
    Constant Field Values
  - RECOGNITION_LEVEL_SEPARATOR
```
public static final java.lang.String RECOGNITION_LEVEL_SEPARATOR
```
    See Also:
    
    Constant Field Values
  - timeconsume1
```
public int timeconsume1
```
  - timeconsume2
```
public int timeconsume2
```
  - timeconsume3
```
public int timeconsume3
```
- Constructor Detail
  - DocumentParser
```
public DocumentParser(java.lang.String url)
```
  - DocumentParser
```
public DocumentParser(com.thoughtworks.selenium.Selenium selenium,
                      SeleniumGUIUtilities util)
```
- Method Detail
  - isMainPage
```
public boolean isMainPage()
```
    When we call getAllElements(), if the html contains frame, we will try to get html content with aid of API selenium.open(frameURL); selenium.getHtmlSource(). But selenium will show this framePage on browser, we need to change back to the main page by calling selenium.open(this.url); isMainPage is used to test if the browser contains the main page.
    
    Returns:
    
    true if the main page is shown in the browser; false if other frame page.
  - getBoundsSeparator
```
public java.lang.String getBoundsSeparator()
```
  - setInterrupt
```
public void setInterrupt(boolean interrupt)
```
  - setHTTPProxy
```
public void setHTTPProxy()
```
    This is needed when we read the content from an URL
  - getUrl
```
public java.lang.String getUrl()
```
    Returns:
    
    the url string corresponding the main page
  - getDocument
```
public org.dom4j.Document getDocument(java.lang.String url,
                                      boolean changeMainPageURL)
```
    Parameters:
    
    url - a url string
    
    changeMainPageURL - true if the main page url needs to be changed
    
    Returns:
    
    a dom4j document
  - setDocument
```
public org.dom4j.Document setDocument(java.lang.String urlString,
                                      java.lang.String htmlSource,
                                      boolean changeMainPageURL)
```
    Purpose Get dom4j-DOM according to html-content
    If the html-content is null, then try to get it according to
    the urlString.
    The method setHTTPProxy() may needs to be called before calling this method.
    
    Parameters:
    
    urlString - From where to get the html-content
    
    htmlSource - The html content to be converted to dom4j-DOM
    
    changeMainPageURL - true if the main page url needs to be changed
    
    Returns:
    
    The dom4j-DOM will be returned.
  - getEncodingName
```
public java.lang.String getEncodingName(java.io.InputStream ins)
                                 throws java.io.IOException
```
    Note: To detect the encoding of the input-stream
    
    Parameters:
    
    ins - the input-stream to be parsed
    
    Returns:
    
    the encoding-name of the input-stream
    
    Throws:
    
    java.io.IOException
  - getInnerHtml
```
public java.lang.String getInnerHtml(java.lang.String content,
                                     java.lang.String tag)
```
    Purpose: get the innerHTML from page content
    Note: The tag must be the beginning and ending tag of the content
    
    Parameters:
    
    content - from which to get innerHTML
    
    tag - to which the innerHTML belongs
    
    Returns:
    
    the innerHTML content of tag
  - getAllElements
```
public java.lang.String[] getAllElements()
```
    Returns:
    
    An array of XPATH for all elements in a page
    if the page contains frames, the elements in those
    frames will be returned also.
  - getAllElementsR
```
public void getAllElementsR(org.dom4j.Document doc,
                            java.lang.String prefix,
                            int top,
                            int left,
                            java.lang.String docURL,
                            java.util.List<java.lang.String> elementsXpathList)
```
    Note: This is a recursive method, for each frame in this document, this method will be called.
    
    Parameters:
    
    doc - From where to get nodes
    
    prefix - The prefix to be added before node's xpath
    
    top - The y-coordination of the top-left point of doc
    
    left - The x-coordination of the top-left point of doc
    
    docURL - The url representing the doc
    
    elementsXpathList - A list contains all xpath, and it will be returned.
  - getFrameSrcURL
```
public java.lang.String getFrameSrcURL(org.dom4j.Document document,
                                       java.lang.String xpath,
                                       java.lang.String parentURL)
```
    Parameters:
    
    document - In this document, the xpath will be searched.
    
    xpath - The xpath representing a frame
    
    parentURL - The url where the frame resides; the first parameter document
    is the document-content of this url.
    Why need this redundant
    parameter? Because, if the frame's src is relative, we need to append
    it to the end of this parentURL to form an absolute one.
    
    Returns:
    
    An absolute url of the frame's src
  - getFrameSrcURL
```
public java.lang.String getFrameSrcURL(org.dom4j.Node frameNode,
                                       java.lang.String parentURL)
```
    Parameters:
    
    frameNode - From where to get the value of attribute 'src'
    
    parentURL - If the attribute 'src' of the frameNode contains a relative
    url, the parentURL will be added in front
    
    Returns:
    
    an absolute url indicated by attribute 'src' of the frameNode
  - normalizeURL
```
public java.lang.String normalizeURL(java.lang.String url)
```
  - getFrameTop
```
public int getFrameTop(org.dom4j.Node frameNode)
```
  - getFrameLeft
```
public int getFrameLeft(org.dom4j.Node frameNode)
```
  - getAttributes
```
public java.util.HashMap getAttributes(java.lang.String url,
                                       java.lang.String xpath)
```
    Purpose: Get all properties of an element on html page.
    
    Parameters:
    
    url - The url representing the html page.
    
    xpath - The xpath representing the element.
    
    Returns:
  - getAttributes
```
public java.util.HashMap getAttributes(org.dom4j.Document doc,
                                       java.lang.String xpath)
```
    Purpose: Get all properties of an element on html page.
    
    Parameters:
    
    doc - Dom4j object Document, represent the document of html page.
    
    xpath - The xpath representing the element.
    
    Returns:
  - getAttributes
```
public java.util.HashMap getAttributes(org.dom4j.Node node)
```
    Purpose: Get all properties of an element on html page.
    
    Parameters:
    
    node - Dom4j object Node,representing the element on html page.
    
    Returns:
  - getAttribute
```
public java.lang.String getAttribute(java.lang.String url,
                                     java.lang.String xpath,
                                     java.lang.String attribute)
```
    Purpose: Get the value of an attribute
    
    Parameters:
    
    url - The url representing the html page.
    
    xpath - An xpath representing an element on the web page.
    
    attribute - The attribute's name.
    
    Returns:
  - getAttribute
```
public java.lang.String getAttribute(org.dom4j.Document doc,
                                     java.lang.String xpath,
                                     java.lang.String attribute)
```
    Purpose: Get the value of an attribute
    
    Parameters:
    
    doc - A DOM4J Document (org.dom4j.Document)
    
    xpath - An xpath representing an element on the web page.
    
    attribute - The attribute's name.
    
    Returns:
    
    The value of an attribute
  - getAttribute
```
public java.lang.String getAttribute(org.dom4j.Node node,
                                     java.lang.String attribute)
```
    Purpose: Get the value of an attribute
    
    Parameters:
    
    node - A DOM4J Node (org.dom4j.Node)
    
    attribute - The attribute's name.
    
    Returns:
    
    The value of an attribute
  - goBackToMainPage
```
public void goBackToMainPage()
```
  - getNodeXPath
```
public java.lang.String getNodeXPath(org.dom4j.Node node,
                                     java.lang.String prefix)
```
  - getRobotRecognitionNode
```
public SPCTreeNode getRobotRecognitionNode(java.lang.String xpath,
                                           boolean withName)
```
    According to a xpath, return SPCTreeNode containig "SAFS recognition string", Html element's id, Html element's name
    
    Parameters:
    
    xpath -
    
    withName - if true, a generated component name will be prefix of recognition string
    Ex, "ButtonInput1=" will be put ahead of recognition string.
    otherwise, only the recognition string will be returned.
    
    Returns:
    
    AUG 10, 2012 (Lei Wang) Use SPCTreeNode to contain "recognition string", element's id and element's name. Return SPCTreeNode as result.
  - getRobotRecognition
```
public java.lang.String getRobotRecognition(java.lang.String xpath,
                                            boolean withName)
```
    According to a xpath, return "SAFS recognition string"
    
    Parameters:
    
    xpath -
    
    withName - if true, a generated component name will be prefix of recognition string
    Ex, "ButtonInput1=" will be put ahead of recognition string.
    otherwise, only the recognition string will be returned.
    
    Returns:
  - getRobotRecog
```
public java.lang.String getRobotRecog(org.dom4j.Document document,
                                      java.lang.String xpath)
```
    Generate the RS in SAFS-Robot's format for element described by xpath.
    
    Parameters:
    
    document - Where the element locates.
    
    xpath - Xpath to describe an element. It should not contain any Frame.
    
    Returns:
  - getElementFromXpath
```
public org.dom4j.Element getElementFromXpath(org.dom4j.Document document,
                                             java.lang.String xpath)
```
    Parameters:
    
    document - Where to search element for an xpath
    
    xpath - The xpath to be matched.
    
    Returns:
    
    An element matching the xpath
  - getUniqueNameForXpath
```
public java.lang.String getUniqueNameForXpath(java.lang.String url,
                                              java.lang.String xpath,
                                              java.lang.String prefix)
```
  - getIndex
```
public int getIndex(org.dom4j.Document document,
                    org.dom4j.Element element)
```
  - getElementsMatchingTags
```
public java.util.List getElementsMatchingTags(org.dom4j.Document document,
                                              java.lang.String[] tags)
```
  - getAllXPath
```
public java.util.List<java.lang.String> getAllXPath(org.dom4j.Document doc,
                                                    java.lang.String[] tags)
```
  - getXpath
```
public java.lang.String getXpath(org.dom4j.Document document,
                                 java.lang.String[] tags,
                                 java.lang.String[][] attributes,
                                 int index,
                                 boolean secondaryMatch,
                                 boolean matchPartial)
```
    This functions takes an array of (HTML) tags, a double array of attributes to check,
    and whether to check the text attributes for partial matches. It try the xpath
    matching the tags and attributes in the order of occurrence on the page, and return
    the one matching the index
    
    Parameters:
    
    document - In which document, the elements will be tested for matching
    
    tags - An array containing html-tags, among these elements, we select one matched.
    
    attributes - The attributes (name, value) needs to be matched for an element
    
    index - If there are several matched element according to tags and atributes, index
    is used to indicate which one we want.
    
    secondaryMatch - boolean, if false, indicating this is the first time searching.
    If the first time searching, no element is found, a secondary serch will be tried.
    
    matchPartial - boolean, if true, the attribute's value will be partial-matched.
    
    Returns:
  - getFrameIndex
```
public int getFrameIndex(org.dom4j.Document document,
                         java.lang.String xpath)
```
  - checkAttributes
```
public boolean checkAttributes(org.dom4j.Node node,
                               java.lang.String[][] attrcheck,
                               boolean matchPartial)
```
  - getClientScrollInfo
```
public java.lang.String getClientScrollInfo(com.thoughtworks.selenium.Selenium selenium)
                                     throws SAFSException
```
    Throws:
    
    SAFSException
  - getBrowserClientScreenPosition
```
public java.lang.String getBrowserClientScreenPosition(com.thoughtworks.selenium.Selenium selenium)
                                                throws SAFSException
```
    Throws:
    
    SAFSException
  - getSSBounds
```
public java.lang.String getSSBounds()
```
  - test
```
public void test()
```
  - main
```
public static void main(java.lang.String[] args)
```
    Parameters:
    
    args -

Class DocumentParser

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

documents

url

selenium

util

XPATH_ALL_LEVEL_PREFIX

XPATH_ALL_ELEMENTS

XPATH_ATTRIBUTES

BOUDNS_SEPARATOR

ASSIGN_SEPARATOR

RECOGNITION_LEVEL_SEPARATOR

timeconsume1

timeconsume2

timeconsume3

Constructor Detail

DocumentParser

DocumentParser

Method Detail

isMainPage

getBoundsSeparator

setInterrupt

setHTTPProxy

getUrl

getDocument

setDocument

getEncodingName

getInnerHtml

getAllElements

getAllElementsR

getFrameSrcURL

getFrameSrcURL

normalizeURL

getFrameTop

getFrameLeft

getAttributes

getAttributes

getAttributes

getAttribute

getAttribute

getAttribute

goBackToMainPage

getNodeXPath

getRobotRecognitionNode

getRobotRecognition

getRobotRecog

getElementFromXpath

getUniqueNameForXpath

getIndex

getElementsMatchingTags

getAllXPath

getXpath

getFrameIndex

checkAttributes

getClientScrollInfo

getBrowserClientScreenPosition

getSSBounds

test

main