Class IllegalSymbolCheck

All Implemented Interfaces:
Configurable, Contextualizable

public class IllegalSymbolCheck extends AbstractCheck
Checks that specified symbols (by Unicode code points or ranges) are not used in code. By default, blocks common symbol ranges (U+2190–U+27BF and U+1F700–U+10FFFF).

Rationale: This check helps prevent emoji symbols and special characters in code (commonly added by AI tools), enforce coding standards, or forbid specific Unicode characters.

Default ranges cover:

  • U+2190–U+27BF: Arrows, Mathematical Operators, Box Drawing, Geometric Shapes, Miscellaneous Symbols, and Dingbats
  • U+1F700–U+10FFFF: Alchemical Symbols, Emoticons, Transport Symbols, and all other pictographic symbols

For a complete list of Unicode characters and ranges, see: List of Unicode characters

  • Property symbolCodes - Specify the symbols to check for, as Unicode code points or ranges. Format: comma-separated list of hex codes or ranges (e.g., "0x2705, 0x1F600-0x1F64F"). To allow only ASCII characters, use "0x0080-0x10FFFF". Type is java.lang.String. Default value is "0x2190-0x27BF, 0x1F700-0x10FFFF".
Since:
13.1.0
  • Field Details

  • Constructor Details

  • Method Details

    • setSymbolCodes

      public void setSymbolCodes(String symbols)
      Setter to specify the symbols to check for. Format: comma-separated list of hex codes or ranges (e.g., "0x2705, 0xd83c-0xd83e").
      Parameters:
      symbols - the symbols specification
      Since:
      13.1.0
    • getDefaultTokens

      public int[] getDefaultTokens()
      Description copied from class: AbstractCheck
      Returns the default token a check is interested in. Only used if the configuration for a check does not define the tokens.
      Specified by:
      getDefaultTokens in class AbstractCheck
      Returns:
      the default tokens
      See Also:
    • getAcceptableTokens

      public int[] getAcceptableTokens()
      Description copied from class: AbstractCheck
      The configurable token set. Used to protect Checks against malicious users who specify an unacceptable token set in the configuration file. The default implementation returns the check's default tokens.
      Specified by:
      getAcceptableTokens in class AbstractCheck
      Returns:
      the token set this check is designed for.
      See Also:
    • getRequiredTokens

      public int[] getRequiredTokens()
      Description copied from class: AbstractCheck
      The tokens that this check must be registered for.
      Specified by:
      getRequiredTokens in class AbstractCheck
      Returns:
      the token set this must be registered for.
      See Also:
    • isCommentNodesRequired

      public boolean isCommentNodesRequired()
      Description copied from class: AbstractCheck
      Whether comment nodes are required or not.
      Overrides:
      isCommentNodesRequired in class AbstractCheck
      Returns:
      false as a default value.
    • visitToken

      public void visitToken(DetailAST ast)
      Description copied from class: AbstractCheck
      Called to process a token.
      Overrides:
      visitToken in class AbstractCheck
      Parameters:
      ast - the token to process
    • checkText

      private void checkText(String text, DetailAST ast)
      Check the text for illegal symbols.
      Parameters:
      text - the text to check
      ast - the AST node
    • isIllegalSymbol

      private boolean isIllegalSymbol(int codePoint)
      Check if a code point is illegal based on configured ranges.
      Parameters:
      codePoint - the code point to check
      Returns:
      true if the code point is illegal
    • isInSymbolCodes

      private boolean isInSymbolCodes(int codePoint)
      Check if code point is in the configured symbol codes.
      Parameters:
      codePoint - the code point to check
      Returns:
      true if in symbol codes
    • isInRange

      private static boolean isInRange(int codePoint, String rangeStr)
      Check if code point is in the specified range.
      Parameters:
      codePoint - the code point to check
      rangeStr - the range string (e.g., "0x1F600-0x1F64F")
      Returns:
      true if in range
    • parseCodePoint

      private static int parseCodePoint(String str)
      Parse a code point from string representation. Supports formats: 0x1234, \\u1234, U+1234, or decimal.
      Parameters:
      str - the string to parse
      Returns:
      the code point value