Releases: rgrove/sanitize
Releases · rgrove/sanitize
Version 3.1.1 (2015-02-04)
- Fixed:
#document
and#fragment
failed on frozen strings, and could unintentionally modify unfrozen strings if they used an encoding other than UTF-8 or if they contained characters not allowed in HTML. @AnchorCat - #128
Version 3.1.0 (2014-12-22)
- Added the following CSS properties to the relaxed config. @ehudc - #120
-moz-text-size-adjust
-ms-text-size-adjust
-webkit-text-size-adjust
text-size-adjust
- Updated Nokogumbo to 1.2.0 to pick up a fix for a Gumbo bug where the entity
Æ
left its semicolon behind when it was converted to a character during parsing. #119
Version 3.0.4 (2014-12-12)
- Fixed: Harmless whitespace preceding a URL protocol (such as " http://") caused the URL to be removed even when the protocol was whitelisted. @benubois - #126
Version 3.0.3 (2014-10-29)
- Fixed: Some CSS selectors weren't parsed correctly inside the body of a
@media
block, causing them to be removed even when whitelist rules should have allowed them to remain. #121
Version 3.0.2 (2014-09-02)
- Updated Nokogumbo to 1.1.12, because 1.1.11 silently reverted the change we were trying to pick up in the last release. Now issue #114 is actually fixed.
Version 3.0.1 (2014-09-02)
- Updated Nokogumbo to 1.1.11 to pick up a fix for a Gumbo bug in which certain HTML character entities, such as
Ö
, were parsed incorrectly, leaving the semicolon behind in the output. #114
Version 3.0.0 (2014-06-21)
As of this version, Sanitize adheres strictly to the SemVer 2.0.0 versioning standard. This release contains API and output changes that are incompatible with previous releases, as indicated by the major version increment.
Backwards-incompatible changes
- HTML is now parsed using Google's Gumbo HTML5 parser, which adheres to the HTML5 parsing spec and behaves much more like modern browser parsers than the previous libxml2-based parser. As a result, HTML output may differ from that of previous versions of Sanitize.
- All transformers now traverse the document from the top down, starting with the first node, then its first child, and so on. The
:transformers_breadth
config has been removed, and old bottom-up transformers (the previous default) may need to be rewritten. - Sanitize's built-in configs are now deeply frozen to prevent people from modifying them (either accidentally or maliciously). To customize a built-in config, create a new copy using
Sanitize::Config.merge()
, like so:
Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
:elements => Sanitize::Config::BASIC[:elements] + ['div', 'table'],
:remove_contents => true
))
- The
clean!
andclean_document!
methods were removed, since they weren't useful and tended to confuse people. - The
clean
method was renamed tofragment
to more clearly indicate that its intended use is to sanitize an HTML fragment. - The
clean_document
method was renamed todocument
. - The
clean_node!
method was renamed tonode!
. - The
document
method now raises aSanitize::Error
if the<html>
element isn't whitelisted, rather than aRuntimeError
. This error is also now raised regardless of the:remove_contents
config setting. - The
:output
config has been removed. Output is now always HTML, not XHTML. - The
:output_encoding
config has been removed. Output is now always UTF-8.
Other changes
- Added advanced CSS sanitization support using Crass, which is fully compliant with the CSS Syntax Module Level 3 parsing spec. The contents of whitelisted
<style>
elements andstyle
attributes in HTML will be sanitized as CSS, or you can use theSanitize::CSS
class to manually sanitize CSS stylesheets or properties. - Added an
:allow_doctype
setting. Whentrue
, well-formed doctype definitions will be allowed in documents. Whenfalse
(the default), doctype definitions will be removed from documents. Doctype definitions are never allowed in fragments, regardless of this setting. - Added the following elements to the relaxed config, in addition to various attributes:
article
,aside
,body
,data
,div
,footer
,head
,header
,html
,main
,nav
,section
,span
,style
,title
. - The
:whitespace_elements
config is now a Hash, and allows you to specify the text that should be inserted before and after these elements when they're removed. The old-style Array-based config value is still supported for backwards compatibility. @alperkokmen - #94 - Unsuitable Unicode characters are now removed from HTML before it's parsed. #106
- Fixed: Non-tag brackets in input like
"1 > 2 and 2 < 1"
are now parsed and escaped correctly in accordance with the HTML5 spec, becoming"1 > 2 and 2 < 1"
. #83 - Fixed: Siblings added after the current node during traversal are now also traversed. In previous versions they were simply skipped. #91
- Fixed: Nokogiri has been smacked and instructed to stop adding newlines after certain elements, because if people wanted newlines there they'd have put them there, dammit. #103
- Fixed: Added a workaround for a libxml2 bug that caused an undesired content-type meta tag to be added to all documents with
<head>
elements. Nokogiri #1008