In the modern Java development lifecycle, catching bugs, security vulnerabilities, and code quality issues early is crucial for maintaining robust applications. Semgrep is a fast, open-source static analysis tool that uses pattern-based rules to identify specific code patterns across your codebase. For Java teams, Semgrep provides a flexible, lightweight alternative to traditional SAST tools, enabling custom rule creation that targets your specific code patterns and security concerns.
What is Semgrep?
Semgrep is a static analysis tool that combines the simplicity of grep with the semantic awareness of abstract syntax trees (ASTs). It allows developers to write rules that match code patterns while understanding Java's syntax and structure. Unlike traditional linters, Semgrep can traverse method boundaries and understand code context, making it powerful for identifying complex code smells and security vulnerabilities.
Why Semgrep is Valuable for Java Development
- Fast Feedback Loop: Runs in seconds on large codebases, enabling integration into pre-commit hooks and CI pipelines without slowing down development.
- Custom Rule Creation: Teams can write project-specific rules to enforce architectural patterns, coding standards, and security requirements.
- No Build Required: Analyzes source code directly without needing to compile the project, making it ideal for code review automation.
- Gradual Adoption: Can be introduced incrementally alongside existing tools like SpotBugs or SonarQube.
- Security as Code: Rules are written in YAML, making them versionable, testable, and easy to share across teams.
Semgrep Rule Structure for Java
A Semgrep rule consists of a pattern to match and metadata describing the issue:
rules:
- id: string-concatenation-in-loop
message: String concatenation in loops may cause performance issues
languages: [java]
severity: WARNING
pattern: |
for (...) {
...
$VAR = $VAR + ...;
...
}
Comprehensive Java Rule Examples
1. Security Vulnerability Detection
rules:
- id: log4shell-vulnerability
message: Potential Log4Shell vulnerability - user input in logger call
languages: [java]
severity: ERROR
pattern: |
$LOGGER.info($USER_INPUT);
patterns:
- pattern: $LOGGER.info(...);
- pattern-not: $LOGGER.info("...");
- metavariable-regex:
metavariable: $LOGGER
regex: (log|logger|LOG)
2. Spring Security Configuration Checks
rules:
- id: missing-csrf-protection
message: CSRF protection might be disabled in Spring Security configuration
languages: [java]
severity: ERROR
pattern: |
@Configuration
public class $CLASS {
@Bean
public SecurityFilterChain $FILTER_CHAIN(HttpSecurity http) throws Exception {
http.csrf().disable();
}
}
3. Resource Management
rules:
- id: unclosed-database-connection
message: Database connection might not be closed properly
languages: [java]
severity: WARNING
pattern: |
Connection $CONN = $DRIVER.getConnection(...);
...
// Missing conn.close()
pattern-not: |
try (Connection $CONN = $DRIVER.getConnection(...)) {
...
}
4. Null Pointer Prevention
rules:
- id: potential-null-dereference
message: Potential NullPointerException when calling method on nullable object
languages: [java]
severity: ERROR
patterns:
- pattern: $OBJ.$METHOD(...);
- pattern-not: |
if ($OBJ != null) { ... }
- pattern-not: |
if ($OBJ == null) { ... }
- metavariable-regex:
metavariable: $OBJ
regex: ^[a-z].* # matches variables (not literals or static calls)
Advanced Rule Patterns for Java
1. Data Validation Rules
rules:
- id: missing-input-validation
message: User input used without validation
languages: [java]
severity: WARNING
patterns:
- pattern: |
@$MAPPING(...)
public $RETURN $METHOD(..., @$PARAM_ANN $TYPE $INPUT, ...) {
... $INPUT.$METHOD(...) ...
}
- metavariable-regex:
metavariable: $PARAM_ANN
regex: (RequestParam|PathVariable|RequestBody)
- pattern-not: |
@Valid $TYPE $INPUT
2. Exception Handling Rules
rules:
- id: swallowed-exception
message: Exception is caught but not handled properly
languages: [java]
severity: WARNING
pattern: |
try {
...
} catch ($EXCEPTION $e) {
$e.printStackTrace();
}
3. Spring Transaction Management
rules: - id: transaction-too-long message: Database operation outside of transactional context languages: [java] severity: INFO patterns: - pattern: | $ENTITY_REPO.$METHOD(...); - pattern-not: | @Transactional ... $ENTITY_REPO.$METHOD(...); - metavariable-regex: metavariable: $ENTITY_REPO regex: .*Repository
Real-World Java Application Scenarios
Scenario 1: Preventing SQL Injection
rules:
- id: sql-injection-jdbc
message: Potential SQL injection in JDBC query
languages: [java]
severity: ERROR
patterns:
- pattern: |
Statement $STMT = ...;
$STMT.executeQuery("SELECT ... " + $USER_INPUT + " ...");
- pattern-not: |
PreparedStatement $STMT = ...;
$STMT.setString(..., $USER_INPUT);
Scenario 2: Ensuring Proper Authentication
rules:
- id: unauthenticated-endpoint
message: REST endpoint missing authentication requirement
languages: [java]
severity: ERROR
pattern: |
@$MAPPING(...)
public $RETURN $METHOD(...) {
...
}
patterns:
- pattern: |
@$MAPPING(...)
public $RETURN $METHOD(...) { ... }
- pattern-not: |
@PreAuthorize("isAuthenticated()")
@$MAPPING(...)
public $RETURN $METHOD(...) { ... }
- pattern-not: |
@PermitAll
@$MAPPING(...)
public $RETURN $METHOD(...) { ... }
Integrating Semgrep into Java Development Workflow
1. Pre-commit Hook Configuration
# .pre-commit-config.yaml repos: - repo: https://github.com/returntocorp/semgrep rev: 'v1.42.0' hooks: - id: semgrep args: ['--config', 'p/r2c-ci', '--config', 'p/security-audit', '--config', 'rules/']
2. Maven Integration
<!-- pom.xml --> <plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>exec-maven-plugin</artifactId> <version>3.1.0</version> <executions> <execution> <id>semgrep-scan</id> <phase>verify</phase> <goals> <goal>exec</goal> </goals> <configuration> <executable>semgrep</executable> <arguments> <argument>--config</argument> <argument>p/ci</argument> <argument>--config</argument> <argument>./semgrep-rules/</argument> <argument>--error</argument> </arguments> </configuration> </execution> </executions> </plugin>
3. GitHub Actions Integration
# .github/workflows/semgrep.yml
name: Semgrep Scan
on:
pull_request: {}
push:
branches: [main, master]
jobs:
semgrep:
name: Semgrep Scan
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: returntocorp/semgrep-action@v1
with:
config: p/ci
publishToken: ${{ secrets.SEMGREP_APP_TOKEN }}
Custom Rule Development Workflow
1. Rule Testing
# rules/test-rule.yaml
rules:
- id: test-collection-stream
message: Use streams for collection processing
languages: [java]
severity: INFO
pattern: |
for ($TYPE $ITEM : $COLLECTION) {
if ($CONDITION) {
$RESULT.add($ITEM);
}
}
Test Cases:
// test-files/TestCollection.java
public class TestCollection {
public void processList(List<String> items) {
// Should match
List<String> result = new ArrayList<>();
for (String item : items) {
if (item.startsWith("A")) {
result.add(item);
}
}
// Should not match - already using streams
List<String> filtered = items.stream()
.filter(item -> item.startsWith("A"))
.collect(Collectors.toList());
}
}
2. Rule Validation
# Test rule against test cases semgrep --test rules/test-rule.yaml # Scan specific directory semgrep --config rules/ --json > results.json
Best Practices for Java Semgrep Rules
- Start with Existing Rules: Begin with Semgrep's registry rules and customize as needed.
- Use Metavariables Wisely: Leverage metavariables for flexible pattern matching.
- Combine Patterns: Use multiple patterns with
patterns:for complex conditions. - Test Thoroughly: Create comprehensive test cases for each custom rule.
- Focus on High-Impact Issues: Prioritize security, performance, and maintainability concerns.
- Iterate and Refine: Update rules based on false positives and new patterns discovered.
Integration with Java Ecosystem
- SpotBugs/FindSecBugs: Complementary coverage - Semgrep for custom patterns, SpotBugs for built-in bug patterns
- SonarQube: Use Semgrep for rapid, custom checks alongside SonarQube's comprehensive analysis
- Checkstyle/PMD: Semgrep can enforce similar code style rules with more flexibility
- CI/CD Pipelines: Fast feedback in pull requests and build processes
Conclusion
Semgrep provides Java development teams with a powerful, flexible tool for enforcing code quality, security standards, and architectural patterns. Its simple YAML-based rule syntax makes it accessible to developers without deep static analysis expertise, while its performance enables integration into fast-paced development workflows.
By creating custom Semgrep rules tailored to your Java application's specific needs, teams can catch issues early, maintain code consistency, and prevent security vulnerabilities from reaching production. Whether used as a primary static analysis tool or as a complement to existing solutions, Semgrep empowers Java teams to write better, more secure code through automated, pattern-based code review.