dev-config-naming.txt 4.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165
  1. Configuration naming
  2. HTML Purifier 4.0.0 features a new configuration naming system that
  3. allows arbitrary nesting of namespaces. While there are certain cases
  4. in which using two namespaces is obviously better (the canonical example
  5. is where we were using AutoFormatParam to contain directives for AutoFormat
  6. parameters), it is unclear whether or not a general migration to highly
  7. namespaced directives is a good idea or not.
  8. == Case studies ==
  9. === Attr.* ===
  10. We have a dead duck HTML.Attr.Name.UseCDATA which migrated before we decided
  11. to think this out thoroughly.
  12. We currently have a large number of directives in the Attr.* namespace.
  13. These directives tweak the behavior of some HTML attributes. They have
  14. the properties:
  15. * While they apply to only one attribute at a time, the attribute can
  16. span over multiple elements (not necessarily all attributes, either).
  17. The information of which elements it impacts is either omitted or
  18. informally stated (EnableID applies to all elements, DefaultImageAlt
  19. applies to <img> tags, AllowedRev doesn't say but only applies to a tags).
  20. * There is a certain degree of clustering that could be applied, especially
  21. to the ID directives. The clustering could be done with respect to
  22. what element/attribute was used, i.e.
  23. *.id -> EnableID, IDBlacklistRegexp, IDBlacklist, IDPrefixLocal, IDPrefix
  24. img.src -> DefaultInvalidImage
  25. img.alt -> DefaultImageAlt, DefaultInvalidImageAlt
  26. bdo.dir -> DefaultTextDir
  27. a.rel -> AllowedRel
  28. a.rev -> AllowedRev
  29. a.target -> AllowedFrameTargets
  30. a.name -> Name.UseCDATA
  31. * The directives often reference generic attribute types that were specified
  32. in the DTD/specification. However, some of the behavior specifically relies
  33. on the fact that other use cases of the attribute are not, at current,
  34. supported by HTML Purifier.
  35. AllowedRel, AllowedRev -> heavily <a> specific; if <link> ends up being
  36. allowed, we will also have to give users specificity there (we also
  37. want to preserve generality) DTD %Linktypes, HTML5 distinguishes
  38. between <link> and <a>/<area>
  39. AllowedFrameTargets -> heavily <a> specific, but also used by <area>
  40. and <form>. Transitional DTD %FrameTarget, not present in strict,
  41. HTML5 calls them "browsing contexts"
  42. Default*Image* -> as a default parameter, is almost entirely exlcusive
  43. to <img>
  44. EnableID -> global attribute
  45. Name.UseCDATA -> heavily <a> specific, but has heavy other usage by
  46. many things
  47. == AutoFormat.* ==
  48. These have the fairly normal pluggable architecture that lends itself to
  49. large amounts of namespaces (pluggability may be the key to figuring
  50. out when gratuitous namespacing is good.) Properties:
  51. * Boolean directives are fair game for being namespaced: for example,
  52. RemoveEmpty.RemoveNbsp triggers RemoveEmpty.RemoveNbsp.Exceptions,
  53. the latter of which only makes sense when RemoveEmpty.RemoveNbsp
  54. is set to true. (The same applies to RemoveNbsp too)
  55. The AutoFormat string is a bit long, but is the only bit of repeated
  56. context.
  57. == Core.* ==
  58. Core is the potpourri of directives, mostly regarding some minor behavioral
  59. tweaks for HTML handling abilities.
  60. AggressivelyFixLt
  61. AllowParseManyTags
  62. ConvertDocumentToFragment
  63. DirectLexLineNumberSyncInterval
  64. LexerImpl
  65. MaintainLineNumbers
  66. Lexer
  67. CollectErrors
  68. Language
  69. Error handling (Language is ostensibly a little more general, but
  70. it's only used for error handling right now)
  71. ColorKeywords
  72. CSS and HTML
  73. Encoding
  74. EscapeNonASCIICharacters
  75. Character encoding
  76. EscapeInvalidChildren
  77. EscapeInvalidTags
  78. HiddenElements
  79. RemoveInvalidImg
  80. Lexing/Output
  81. RemoveScriptContents
  82. Deprecated
  83. == HTML.* ==
  84. AllowedAttributes
  85. AllowedElements
  86. AllowedModules
  87. Allowed
  88. ForbiddenAttributes
  89. ForbiddenElements
  90. Element set tuning
  91. BlockWrapper
  92. Child def advanced twiddle
  93. CoreModules
  94. CustomDoctype
  95. Advanced HTMLModuleManager twiddles
  96. DefinitionID
  97. DefinitionRev
  98. Caching
  99. Doctype
  100. Parent
  101. Strict
  102. XHTML
  103. Global environment
  104. MaxImgLength
  105. Attribute twiddle? (applies to two attributes)
  106. Proprietary
  107. SafeEmbed
  108. SafeObject
  109. Trusted
  110. Extra functionality/tagsets
  111. TidyAdd
  112. TidyLevel
  113. TidyRemove
  114. Tidy
  115. == Output.* ==
  116. These directly affect the output of Generator. These are all advanced
  117. twiddles.
  118. == URI.* ==
  119. AllowedSchemes
  120. OverrideAllowedSchemes
  121. Scheme tuning
  122. Base
  123. DefaultScheme
  124. Host
  125. Global environment
  126. DefinitionID
  127. DefinitionRev
  128. Caching
  129. DisableExternalResources
  130. DisableExternal
  131. DisableResources
  132. Disable
  133. Contextual/authority tuning
  134. HostBlacklist
  135. Authority tuning
  136. MakeAbsolute
  137. MungeResources
  138. MungeSecretKey
  139. Munge
  140. Transformation behavior (munge can be grouped)