
Linuxeden 开源社区 --
jsoup 1.10.3 发布了,该版本带来了更好的 CSS 选择器性能,Jsoup.Connection 改进和其他 bug 修复。
详情包括:
Improvements
- Added
Elements.eachText()
andElements.eachAttr()
, which return a list of anElement's
text or attribute values, respectively. This makes it simpler to for example get a list of each URL on a page:List<String> urls = doc.select("a").eachAttr("abs:href"");
- Improved selector validation for
:contains(...)
with unbalanced quotes. - Improved the speed of index based CSS selectors and other methods that use elementSiblingIndex, by a factor of 34x.
- Added
Node.clearAttributes()
, to simplify removing of all attributes of aNode
/Element
.
Fixes
- Bugfix: if an attribute name started or ended with a control character, the parse would fail with a validation exception.
- Bugfix:
Element.hasClass()
and the.classname
selector would not find the class attribute case-insensitively. - Bugfix: In
Jsoup.Connection
, if a redirect contained a query string with%xx
escapes, they would be double escaped before the redirect was followed, leading to fetching an incorrect location. - Bugfix: In
Jsoup.Connection
, if a request body was set and the connection was redirected, the body would incorrectly still be sent. - Bugfix: In
DataUtil
when detecting the character set from meta data, and there are two Content-Types defined, use the one that defines a character set. - Bugfix: when parsing unknown tags in case-sensitive HTML mode, end tags would not close scope correctly.
- In
Jsoup.Connection
, ensure there is no Content-Type set when being redirected to a GET. - Bugfix: in certain locales (Turkish specifically), lowercasing and case insensitivity could fail for specific items.
转自 http://ift.tt/2sdnEyF
The post jsoup 1.10.3 发布,Java 的 HTML 解析器 appeared first on Linuxeden开源社区.
http://ift.tt/2r9Zy4Q
没有评论:
发表评论