Semgrep is a fast, open-source static analysis tool for finding bugs and enforcing code standards. This guide covers comprehensive Java rule development and integration.
Core Concepts
What is Semgrep?
- Lightweight static analysis tool
- Pattern-based code searching
- Supports multiple languages including Java
- Easy-to-write custom rules
- CI/CD integration friendly
Key Features for Java:
- AST-based pattern matching
- Custom rule creation
- Security vulnerability detection
- Code quality enforcement
- Best practices validation
Semgrep Rule Structure
1. Basic Rule Anatomy
rules: - id: rule-unique-identifier patterns: - pattern: "pattern-to-match" message: "Description of the issue" languages: [java] severity: ERROR | WARNING | INFO metadata: category: security | correctness | performance technology: - java - spring
2. Rule Metadata
rules: - id: java-best-practices message: "Best practice violation" languages: [java] severity: WARNING metadata: category: best-practices confidence: HIGH likelihood: MEDIUM impact: LOW cwe: ["CWE-117"] # Common Weakness Enumeration owasp: ["A1:2017"] # OWASP Top 10 references: - "https://example.com/best-practices"
Security Rules
1. SQL Injection Detection
rules: - id: java-sql-injection patterns: - pattern: | $QUERY.execute($INPUT) - pattern: | $QUERY.executeQuery($INPUT) - pattern: | $QUERY.executeUpdate($INPUT) - metavariable-regex: metavariable: $QUERY regex: (.*Statement|.*Query) - metavariable-regex: metavariable: $INPUT regex: (.*\+.*|".*"\+.*|.*\+".*") message: "Potential SQL injection vulnerability. Use prepared statements instead of string concatenation." languages: [java] severity: ERROR metadata: category: security cwe: ["CWE-89"] owasp: ["A1:2017", "A3:2017"] technology: ["java", "jdbc"]
2. Hardcoded Credentials
rules:
- id: java-hardcoded-credentials
patterns:
- pattern: |
String $VAR = "...";
- metavariable-regex:
metavariable: $VAR
regex: (password|pwd|secret|key|token)
- metavariable-regex:
metavariable: "..."
regex: (?!(""|''|null)).+
message: "Hardcoded credentials detected. Use secure configuration management."
languages: [java]
severity: ERROR
metadata:
category: security
cwe: ["CWE-798"]
owasp: ["A2:2017"]
3. Insecure Random Number Generation
rules: - id: java-insecure-random patterns: - pattern: | new Random() - pattern: | new java.util.Random() message: "Insecure random number generator detected. Use SecureRandom for cryptographic operations." languages: [java] severity: WARNING metadata: category: security cwe: ["CWE-338"] technology: ["java"]
4. XXE Injection
rules:
- id: java-xxe-injection
patterns:
- pattern: |
DocumentBuilderFactory $FACTORY = DocumentBuilderFactory.newInstance();
- pattern-not: |
$FACTORY.setFeature("http://xml.org/sax/features/external-general-entities", false);
- pattern-not: |
$FACTORY.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
message: "XML External Entity (XXE) vulnerability. Disable external entities in DocumentBuilderFactory."
languages: [java]
severity: ERROR
metadata:
category: security
cwe: ["CWE-611"]
owasp: ["A4:2017"]
5. Path Traversal
rules: - id: java-path-traversal patterns: - pattern: | new File($BASE + $USERINPUT) - pattern: | Files.readAllBytes(Paths.get($BASE, $USERINPUT)) - pattern: | new FileInputStream($BASE + $USERINPUT) - metavariable-regex: metavariable: $USERINPUT regex: (.*\\.\\./.*|.*\\.\\.\\\.*) message: "Potential path traversal vulnerability. Validate and sanitize user input for file operations." languages: [java] severity: ERROR metadata: category: security cwe: ["CWE-22"]
Code Quality Rules
1. Null Pointer Exception Prevention
rules:
- id: java-potential-npe
patterns:
- pattern: |
$OBJ.$METHOD(...)
- pattern-not: |
if ($OBJ != null) { ... }
- pattern-not: |
$OBJ != null && ...
- pattern-not: |
Objects.requireNonNull($OBJ)
- metavariable-regex:
metavariable: $OBJ
regex: ^(?!this).*$
message: "Potential NullPointerException. Add null check before method call."
languages: [java]
severity: WARNING
metadata:
category: correctness
technology: ["java"]
2. Resource Leak Detection
rules:
- id: java-resource-leak
patterns:
- pattern: |
$RESOURCE = new $TYPE(...);
- pattern-not: |
try ($TYPE $RESOURCE = ...) { ... }
- pattern-not: |
... $RESOURCE.close() ...
- pattern-not: |
... $RESOURCE = ... .getResource() ...
- metavariable-regex:
metavariable: $TYPE
regex: (.*InputStream|.*OutputStream|.*Reader|.*Writer|.*Connection|.*Statement|.*ResultSet)
message: "Resource may not be properly closed. Use try-with-resources or ensure proper cleanup."
languages: [java]
severity: WARNING
metadata:
category: reliability
technology: ["java"]
3. String Concatenation in Loops
rules:
- id: java-string-concat-in-loop
patterns:
- pattern: |
for (...) {
... $RESULT = $RESULT + ... ...
}
- pattern: |
while (...) {
... $RESULT += ... ...
}
- metavariable-regex:
metavariable: $RESULT
regex: .*
message: "String concatenation in loop detected. Use StringBuilder for better performance."
languages: [java]
severity: WARNING
metadata:
category: performance
technology: ["java"]
4. Empty Catch Block
rules:
- id: java-empty-catch-block
patterns:
- pattern: |
catch (...) {
}
- pattern: |
catch (...) {
// $COMMENT
}
- metavariable-regex:
metavariable: $COMMENT
regex: ^\s*$
message: "Empty catch block detected. At least log the exception."
languages: [java]
severity: WARNING
metadata:
category: correctness
technology: ["java"]
5. System.exit() Usage
rules: - id: java-system-exit patterns: - pattern: | System.exit(...) - pattern: | Runtime.getRuntime().exit(...) message: "System.exit() call detected. Avoid terminating the JVM in application code." languages: [java] severity: ERROR metadata: category: reliability technology: ["java"]
Spring Framework Rules
1. Spring Transaction Management
rules:
- id: spring-transactional-misuse
patterns:
- pattern: |
@Transactional
public void $METHOD(...) {
...
$REPO.save(...);
...
}
- metavariable-regex:
metavariable: $METHOD
regex: (.*delete.*|.*remove.*|.*update.*)
- pattern-not: |
@Transactional(readOnly = true)
message: "Transactional method performing write operations should have proper transaction configuration."
languages: [java]
severity: WARNING
metadata:
category: correctness
technology: ["java", "spring"]
2. Spring Security Configuration
rules:
- id: spring-security-misconfiguration
patterns:
- pattern: |
http.authorizeRequests()
.antMatchers("...").permitAll()
.anyRequest().authenticated();
- pattern-not: |
.and().csrf().disable()
- pattern-not: |
.csrf().disable()
message: "CSRF protection might be disabled. Ensure CSRF is properly configured for state-changing operations."
languages: [java]
severity: WARNING
metadata:
category: security
cwe: ["CWE-352"]
owasp: ["A2:2017"]
technology: ["java", "spring-security"]
3. Spring Bean Injection
rules:
- id: spring-field-injection
patterns:
- pattern: |
@Autowired
private $TYPE $FIELD;
- pattern-not: |
@Autowired
public ...($TYPE $FIELD) { ... }
message: "Field injection detected. Prefer constructor injection for better testability and immutability."
languages: [java]
severity: WARNING
metadata:
category: best-practices
technology: ["java", "spring"]
4. Spring Cache Configuration
rules:
- id: spring-cache-misuse
patterns:
- pattern: |
@Cacheable
public List<$TYPE> $METHOD(...) {
return ...;
}
- pattern-not: |
@CacheEvict
- metavariable-regex:
metavariable: $METHOD
regex: (.*update.*|.*delete.*|.*save.*|.*create.*)
message: "Cacheable method performing write operations. Consider using @CacheEvict for write methods."
languages: [java]
severity: WARNING
metadata:
category: performance
technology: ["java", "spring"]
Performance Rules
1. Inefficient Collection Usage
rules: - id: java-inefficient-collection-init patterns: - pattern: | new ArrayList<>() - pattern-not: | new ArrayList<>($SIZE) - pattern-not: | Arrays.asList(...) message: "ArrayList created without initial capacity. Specify initial capacity for better performance." languages: [java] severity: INFO metadata: category: performance technology: ["java"]
2. Object Allocation in Loops
rules:
- id: java-object-allocation-in-loop
patterns:
- pattern: |
while (...) {
... new $TYPE(...) ...
}
- pattern: |
for (...) {
... new $TYPE(...) ...
}
- metavariable-regex:
metavariable: $TYPE
regex: (.*DateFormat|.*SimpleDateFormat|.*Random)
message: "Object allocation inside loop detected. Move object creation outside the loop."
languages: [java]
severity: WARNING
metadata:
category: performance
technology: ["java"]
3. Redundant String Operations
rules: - id: java-redundant-string-operation patterns: - pattern: | $STR.toString() - metavariable-regex: metavariable: $STR regex: .*String.* message: "Redundant toString() call on String object." languages: [java] severity: INFO metadata: category: performance technology: ["java"]
Testing Rules
1. Test Quality Rules
rules:
- id: java-test-assertion-missing
patterns:
- pattern: |
@Test
public void $TESTNAME(...) {
...
}
- pattern-not: |
assert...
- pattern-not: |
Assertions.assert...
- pattern-not: |
verify(...)
- pattern-not: |
Mockito.verify(...)
message: "Test method missing assertions. Tests should verify expected behavior."
languages: [java]
severity: WARNING
metadata:
category: testing
technology: ["java", "junit"]
2. Flaky Test Detection
rules:
- id: java-flaky-test
patterns:
- pattern: |
@Test
public void $TESTNAME(...) {
... Thread.sleep(...) ...
}
- pattern: |
@Test
public void $TESTNAME(...) {
... System.currentTimeMillis() ...
}
message: "Potential flaky test detected. Avoid sleep and time-based logic in tests."
languages: [java]
severity: WARNING
metadata:
category: testing
technology: ["java", "junit"]
3. Test Setup Issues
rules:
- id: java-test-setup-issues
patterns:
- pattern: |
@Test
public void $TESTNAME(...) {
... new $SERVICE() ...
}
- metavariable-regex:
metavariable: $SERVICE
regex: (.*Service|.*Repository|.*Controller)
message: "Direct instantiation of Spring components in tests. Use dependency injection or mocking."
languages: [java]
severity: WARNING
metadata:
category: testing
technology: ["java", "spring", "junit"]
Best Practices Rules
1. Logging Best Practices
rules:
- id: java-logging-best-practices
patterns:
- pattern: |
logger.debug("User: " + user + " action: " + action)
- pattern: |
logger.info("Processing " + count + " items")
message: "String concatenation in log statements. Use parameterized logging for better performance."
languages: [java]
severity: INFO
metadata:
category: performance
technology: ["java", "logging"]
2. Exception Handling
rules:
- id: java-exception-handling
patterns:
- pattern: |
catch (Exception e) {
throw new RuntimeException(e);
}
- pattern-not: |
catch (Exception e) {
throw new $CUSTOMEXCEPTION("...", e);
}
message: "Generic exception caught and rethrown as RuntimeException. Use specific exception types."
languages: [java]
severity: WARNING
metadata:
category: correctness
technology: ["java"]
3. Optional Misuse
rules: - id: java-optional-misuse patterns: - pattern: | Optional.of($VALUE) - metavariable-regex: metavariable: $VALUE regex: .*null.* message: "Optional.of() called with potentially null value. Use Optional.ofNullable() instead." languages: [java] severity: ERROR metadata: category: correctness technology: ["java"]
4. Date Time API
rules: - id: java-legacy-date-api patterns: - pattern: | new Date(...) - pattern: | new SimpleDateFormat(...) - pattern: | Calendar.getInstance(...) message: "Legacy date-time API detected. Use java.time package for new code." languages: [java] severity: WARNING metadata: category: best-practices technology: ["java"]
Advanced Pattern Matching
1. Metavariable Patterns
rules: - id: java-method-call-pattern patterns: - pattern: | $OBJ.$METHOD($ARGS) - metavariable-regex: metavariable: $METHOD regex: (save|update|delete|create) - metavariable-regex: metavariable: $OBJ regex: (.*Repository|.*DAO|.*Service) message: "Data modification method call detected. Ensure proper transaction boundaries." languages: [java] severity: INFO
2. Pattern-Either for Multiple Cases
rules: - id: java-multiple-exception-types pattern-either: - pattern: | throw new RuntimeException(...) - pattern: | throw new Exception(...) - pattern: | throw new Throwable(...) message: "Generic exception thrown. Use specific exception types for better error handling." languages: [java] severity: WARNING
3. Pattern Insides
rules:
- id: java-resource-try-with-resources
patterns:
- pattern: |
$TYPE $VAR = ...;
try {
...
} finally {
$VAR.close();
}
- metavariable-regex:
metavariable: $TYPE
regex: (.*Closeable|.*AutoCloseable)
message: "Manual resource cleanup detected. Use try-with-resources for automatic cleanup."
languages: [java]
severity: INFO
4. Focused Metavariables
rules: - id: java-focus-metavariable patterns: - pattern: | $FOCUS.method1() - pattern: | $FOCUS.method2() - focus-metavariable: $FOCUS message: "Multiple method calls on the same object detected." languages: [java] severity: INFO
Rule Configuration Files
1. Semgrep Configuration File
# semgrep.yml rules: # Import security rules - rules/security/ # Import code quality rules - rules/quality/ # Import performance rules - rules/performance/ # Custom project-specific rules - rules/custom/ # Exclude paths exclude: - "**/test/**" - "**/generated/**" - "**/build/**" - "**/target/**" # Rule configurations configs: - java - spring
2. Rule Categories Organization
rules/ ├── security/ │ ├── sql-injection.yaml │ ├── xxe.yaml │ └── path-traversal.yaml ├── quality/ │ ├── null-safety.yaml │ ├── resource-management.yaml │ └── exception-handling.yaml ├── performance/ │ ├── collections.yaml │ ├── strings.yaml │ └── objects.yaml ├── spring/ │ ├── security.yaml │ ├── transactions.yaml │ └── dependency-injection.yaml └── custom/ ├── project-specific.yaml └── team-standards.yaml
CI/CD Integration
1. GitHub Actions
# .github/workflows/semgrep.yml name: Semgrep Security Scan on: push: branches: [ main, develop ] pull_request: branches: [ main ] jobs: semgrep: name: Semgrep Scan runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v3 - name: Semgrep Scan uses: returntocorp/semgrep-action@v1 with: config: p/java-security outputFormat: sarif outputFile: semgrep-results.sarif - name: Upload SARIF results uses: github/codeql-action/upload-sarif@v2 with: sarif_file: semgrep-results.sarif
2. GitLab CI
# .gitlab-ci.yml semgrep: image: returntocorp/semgrep script: - semgrep --config=p/java-security --sarif --output=semgrep-results.sarif . artifacts: reports: sarif: semgrep-results.sarif only: - merge_requests - main - develop
3. Jenkins Pipeline
// Jenkinsfile
pipeline {
agent any
stages {
stage('Semgrep Scan') {
steps {
sh '''
docker run --rm -v "${WORKSPACE}:/src" \\
returntocorp/semgrep:latest \\
semgrep --config=p/java-security --json > semgrep-report.json
'''
}
post {
always {
archiveArtifacts artifacts: 'semgrep-report.json', fingerprint: true
}
}
}
}
}
Custom Rule Development
1. Rule Development Workflow
# 1. Create test cases
mkdir -p tests/java
cat > tests/java/test-case.java << EOF
public class TestCase {
public void vulnerableMethod(String input) {
String query = "SELECT * FROM users WHERE id = " + input;
// This should trigger the rule
}
public void safeMethod(String input) {
String query = "SELECT * FROM users WHERE id = ?";
// This should not trigger the rule
}
}
EOF
# 2. Develop rule
cat > rules/security/sql-injection-custom.yaml << EOF
rules:
- id: custom-sql-injection
patterns:
- pattern: |
String $QUERY = ... + $INPUT + ...;
- metavariable-regex:
metavariable: $QUERY
regex: (?i)select.*from
message: "Custom SQL injection rule triggered"
languages: [java]
severity: ERROR
EOF
# 3. Test rule
semgrep --config rules/security/sql-injection-custom.yaml tests/java/
2. Rule Testing Framework
# rules/security/sql-injection-test.yaml rules: - id: sql-injection-test patterns: - pattern: | String $QUERY = ... + $INPUT + ...; - metavariable-regex: metavariable: $QUERY regex: (?i)select.*from message: "SQL injection detected" languages: [java] severity: ERROR tests: - pattern: | String query = "SELECT * FROM users WHERE id = " + userInput; expect: match - pattern: | String query = "SELECT * FROM users WHERE id = ?"; expect: no-match
Performance Optimization
1. Optimized Rule Patterns
rules:
- id: optimized-rule
patterns:
# More specific pattern first
- pattern: |
$CONN.prepareStatement("..." + $INPUT)
# Broader pattern with constraints
- pattern: |
$CONN.$METHOD($QUERY)
- metavariable-regex:
metavariable: $METHOD
regex: (execute|executeQuery|executeUpdate)
- metavariable-regex:
metavariable: $QUERY
regex: .*\+.*
message: "Optimized SQL injection detection"
languages: [java]
severity: ERROR
2. Excluding False Positives
rules:
- id: java-sql-injection-refined
patterns:
- pattern: |
$QUERY.execute($INPUT)
- pattern-not: |
$QUERY.execute("...")
- pattern-not: |
$QUERY.execute($CONSTANT)
- metavariable-regex:
metavariable: $CONSTANT
regex: ^[A-Z_]+$
message: "SQL injection with reduced false positives"
languages: [java]
severity: ERROR
Best Practices for Rule Writing
- Start Specific: Begin with narrow patterns and broaden gradually
- Test Thoroughly: Create comprehensive test cases
- Minimize False Positives: Use pattern-not to exclude known safe patterns
- Provide Clear Messages: Include remediation guidance
- Categorize Properly: Use appropriate severity and metadata
- Performance Aware: Optimize patterns for faster scanning
# Example of well-structured rule
rules:
- id: java-best-practice-example
message: |
Inefficient string concatenation in loop detected.
Problem: Using string concatenation in loops creates many intermediate string objects.
Solution: Use StringBuilder for better performance.
Example:
// Bad
String result = "";
for (String item : items) {
result += item;
}
// Good
StringBuilder result = new StringBuilder();
for (String item : items) {
result.append(item);
}
languages: [java]
severity: WARNING
metadata:
category: performance
technology: ["java"]
references:
- "https://stackoverflow.com/questions/1532461/stringbuilder-vs-string-concatenation-in-tostring-in-java"
Conclusion
Semgrep Java rules provide:
- Comprehensive security scanning for common vulnerabilities
- Code quality enforcement across the codebase
- Performance optimization detection
- Framework-specific patterns for Spring, testing, etc.
- Custom rule development for project-specific standards
By implementing the patterns and rules shown above, you can create a robust static analysis pipeline that catches issues early, enforces coding standards, and improves overall code quality and security. The combination of security rules, code quality checks, and performance optimizations creates a comprehensive safety net for Java development.