Package org.jsoup.nodes
Class Document
- java.lang.Object
-
- org.jsoup.nodes.Node
-
- org.jsoup.nodes.Element
-
- org.jsoup.nodes.Document
-
- All Implemented Interfaces:
java.lang.Cloneable
public class Document extends Element
A HTML Document.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classDocument.OutputSettingsA Document's output settings control the form of the text() and html() methods.static classDocument.QuirksMode
-
Field Summary
Fields Modifier and Type Field Description private java.lang.Stringlocationprivate Document.OutputSettingsoutputSettingsprivate Parserparserprivate Document.QuirksModequirksModeprivate booleanupdateMetaCharset-
Fields inherited from class org.jsoup.nodes.Element
childNodes
-
Fields inherited from class org.jsoup.nodes.Node
EmptyString, parentNode, siblingIndex
-
-
Constructor Summary
Constructors Constructor Description Document(java.lang.String baseUri)Create a new, empty Document.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description Elementbody()Accessor to the document'sbodyelement.java.nio.charset.Charsetcharset()Returns the charset used in this document.voidcharset(java.nio.charset.Charset charset)Sets the charset used in this document.Documentclone()Create a stand-alone, deep copy of this node, and all of its children.ElementcreateElement(java.lang.String tagName)Create a new Element, with this document's base uri.static DocumentcreateShell(java.lang.String baseUri)Create a valid, empty shell of a document, suitable for adding more elements to.private voidensureMetaCharsetElement()Ensures a meta charset (html) or xml declaration (xml) with the current encoding used.private ElementfindFirstElementByTagName(java.lang.String tag, Node node)Elementhead()Accessor to the document'sheadelement.java.lang.Stringlocation()Get the URL this Document was parsed from.java.lang.StringnodeName()Get the node name of this node.Documentnormalise()Normalise the document.private voidnormaliseStructure(java.lang.String tag, Element htmlEl)private voidnormaliseTextNodes(Element element)java.lang.StringouterHtml()Get the outer HTML of this node.Document.OutputSettingsoutputSettings()Get the document's current output settings.DocumentoutputSettings(Document.OutputSettings outputSettings)Set the document's output settings.Parserparser()Get the parser that was used to parse this document.Documentparser(Parser parser)Set the parser used to create this document.Document.QuirksModequirksMode()DocumentquirksMode(Document.QuirksMode quirksMode)Elementtext(java.lang.String text)Set the text of thebodyof this document.java.lang.Stringtitle()Get the string contents of the document'stitleelement.voidtitle(java.lang.String title)Set the document'stitleelement.booleanupdateMetaCharsetElement()Returns whether the element with charset information in this document is updated on changes throughDocument.charset(Charset)or not.voidupdateMetaCharsetElement(boolean update)Sets whether the element with charset information in this document is updated on changes throughDocument.charset(Charset)or not.-
Methods inherited from class org.jsoup.nodes.Element
addClass, after, after, append, appendChild, appendElement, appendText, appendTo, attr, attr, attributes, baseUri, before, before, child, childNodeSize, children, className, classNames, classNames, cssSelector, data, dataNodes, dataset, doClone, doSetBaseUri, elementSiblingIndex, empty, ensureChildNodes, firstElementSibling, getAllElements, getElementById, getElementsByAttribute, getElementsByAttributeStarting, getElementsByAttributeValue, getElementsByAttributeValueContaining, getElementsByAttributeValueEnding, getElementsByAttributeValueMatching, getElementsByAttributeValueMatching, getElementsByAttributeValueNot, getElementsByAttributeValueStarting, getElementsByClass, getElementsByIndexEquals, getElementsByIndexGreaterThan, getElementsByIndexLessThan, getElementsByTag, getElementsContainingOwnText, getElementsContainingText, getElementsMatchingOwnText, getElementsMatchingOwnText, getElementsMatchingText, getElementsMatchingText, hasAttributes, hasClass, hasText, html, html, html, id, insertChildren, insertChildren, is, is, isBlock, lastElementSibling, nextElementSibling, nextElementSiblings, nodelistChanged, normalName, outerHtmlHead, outerHtmlTail, ownText, parent, parents, prepend, prependChild, prependElement, prependText, preserveWhitespace, previousElementSibling, previousElementSiblings, removeClass, select, selectFirst, shallowClone, siblingElements, tag, tagName, tagName, text, textNodes, toggleClass, val, val, wholeText, wrap
-
Methods inherited from class org.jsoup.nodes.Node
absUrl, addChildren, addChildren, attr, childNode, childNodes, childNodesAsArray, childNodesCopy, clearAttributes, equals, filter, hasAttr, hasParent, hasSameValue, indent, nextSibling, outerHtml, ownerDocument, parentNode, previousSibling, remove, removeAttr, removeChild, reparentChild, replaceChild, replaceWith, root, setBaseUri, setParentNode, setSiblingIndex, siblingIndex, siblingNodes, toString, traverse, unwrap
-
-
-
-
Field Detail
-
outputSettings
private Document.OutputSettings outputSettings
-
parser
private Parser parser
-
quirksMode
private Document.QuirksMode quirksMode
-
location
private java.lang.String location
-
updateMetaCharset
private boolean updateMetaCharset
-
-
Constructor Detail
-
Document
public Document(java.lang.String baseUri)
Create a new, empty Document.- Parameters:
baseUri- base URI of document- See Also:
Jsoup.parse(java.lang.String, java.lang.String),createShell(java.lang.String)
-
-
Method Detail
-
createShell
public static Document createShell(java.lang.String baseUri)
Create a valid, empty shell of a document, suitable for adding more elements to.- Parameters:
baseUri- baseUri of document- Returns:
- document with html, head, and body elements.
-
location
public java.lang.String location()
Get the URL this Document was parsed from. If the starting URL is a redirect, this will return the final URL from which the document was served from.- Returns:
- location
-
head
public Element head()
Accessor to the document'sheadelement.- Returns:
head
-
body
public Element body()
Accessor to the document'sbodyelement.- Returns:
body
-
title
public java.lang.String title()
Get the string contents of the document'stitleelement.- Returns:
- Trimmed title, or empty string if none set.
-
title
public void title(java.lang.String title)
Set the document'stitleelement. Updates the existing element, or addstitletoheadif not present- Parameters:
title- string to set as title
-
createElement
public Element createElement(java.lang.String tagName)
Create a new Element, with this document's base uri. Does not make the new element a child of this document.- Parameters:
tagName- element tag name (e.g.a)- Returns:
- new element
-
normalise
public Document normalise()
Normalise the document. This happens after the parse phase so generally does not need to be called. Moves any text content that is not in the body element into the body.- Returns:
- this document after normalisation
-
normaliseTextNodes
private void normaliseTextNodes(Element element)
-
normaliseStructure
private void normaliseStructure(java.lang.String tag, Element htmlEl)
-
findFirstElementByTagName
private Element findFirstElementByTagName(java.lang.String tag, Node node)
-
outerHtml
public java.lang.String outerHtml()
Description copied from class:NodeGet the outer HTML of this node. For example, on apelement, may return<p>Para</p>.- Overrides:
outerHtmlin classNode- Returns:
- outer HTML
- See Also:
Element.html(),Element.text()
-
text
public Element text(java.lang.String text)
Set the text of thebodyof this document. Any existing nodes within the body will be cleared.
-
nodeName
public java.lang.String nodeName()
Description copied from class:NodeGet the node name of this node. Use for debugging purposes and not logic switching (for that, use instanceof).
-
charset
public void charset(java.nio.charset.Charset charset)
Sets the charset used in this document. This method is equivalent toOutputSettings.charset(Charset)but in addition it updates the charset / encoding element within the document.This enables
meta charset update.If there's no element with charset / encoding information yet it will be created. Obsolete charset / encoding definitions are removed!
Elements used:
- Html: <meta charset="CHARSET">
- Xml: <?xml version="1.0" encoding="CHARSET">
- Parameters:
charset- Charset- See Also:
updateMetaCharsetElement(boolean),Document.OutputSettings.charset(java.nio.charset.Charset)
-
charset
public java.nio.charset.Charset charset()
Returns the charset used in this document. This method is equivalent toDocument.OutputSettings.charset().- Returns:
- Current Charset
- See Also:
Document.OutputSettings.charset()
-
updateMetaCharsetElement
public void updateMetaCharsetElement(boolean update)
Sets whether the element with charset information in this document is updated on changes throughDocument.charset(Charset)or not.If set to false (default) there are no elements modified.
- Parameters:
update- If true the element updated on charset changes, false if not- See Also:
charset(java.nio.charset.Charset)
-
updateMetaCharsetElement
public boolean updateMetaCharsetElement()
Returns whether the element with charset information in this document is updated on changes throughDocument.charset(Charset)or not.- Returns:
- Returns true if the element is updated on charset changes, false if not
-
clone
public Document clone()
Description copied from class:NodeCreate a stand-alone, deep copy of this node, and all of its children. The cloned node will have no siblings or parent node. As a stand-alone object, any changes made to the clone or any of its children will not impact the original node.The cloned node may be adopted into another Document or node structure using
Element.appendChild(Node).- Overrides:
clonein classElement- Returns:
- a stand-alone cloned node, including clones of any children
- See Also:
Node.shallowClone()
-
ensureMetaCharsetElement
private void ensureMetaCharsetElement()
Ensures a meta charset (html) or xml declaration (xml) with the current encoding used. This only applies withupdateMetaCharsetset to true, otherwise this method does nothing.- An existing element gets updated with the current charset
- If there's no element yet it will be inserted
- Obsolete elements are removed
Elements used:
- Html: <meta charset="CHARSET">
- Xml: <?xml version="1.0" encoding="CHARSET">
-
outputSettings
public Document.OutputSettings outputSettings()
Get the document's current output settings.- Returns:
- the document's current output settings.
-
outputSettings
public Document outputSettings(Document.OutputSettings outputSettings)
Set the document's output settings.- Parameters:
outputSettings- new output settings.- Returns:
- this document, for chaining.
-
quirksMode
public Document.QuirksMode quirksMode()
-
quirksMode
public Document quirksMode(Document.QuirksMode quirksMode)
-
parser
public Parser parser()
Get the parser that was used to parse this document.- Returns:
- the parser
-
parser
public Document parser(Parser parser)
Set the parser used to create this document. This parser is then used when further parsing within this document is required.- Parameters:
parser- the configured parser to use when further parsing is required for this document.- Returns:
- this document, for chaining.
-
-