CodeQL is GitHub's semantic code analysis engine that allows you to query code as if it were data. It enables security researchers and developers to find vulnerabilities, bugs, and other code issues through custom queries written in the QL language.
What is CodeQL?
CodeQL treats code as a queryable database, allowing you to:
- Find patterns across large codebases
- Identify security vulnerabilities
- Enforce coding standards
- Discover bugs and anti-patterns
CodeQL Architecture
Source Code → Extractor → Database → QL Queries → Results ↑ ↑ ↑ ↑ ↑ Java/C++/ Language-specific Binary Security Vulnerabilities JavaScript AST Generator Format Rules & Issues
Setting Up CodeQL
Installation and Setup
# Install CodeQL CLI gh extension install github/gh-codeql # Or download directly wget https://github.com/github/codeql-cli-binaries/releases/download/v2.14.6/codeql-linux64.zip unzip codeql-linux64.zip export PATH="$PATH:$(pwd)/codeql" # Clone CodeQL repositories codeql pack download codeql/java-queries codeql pack download codeql/java-all
Creating a CodeQL Database
# For Java projects codeql database create java-database \ --language=java \ --command="mvn clean compile -q" \ --source-root=. # For Maven projects with wrapper codeql database create java-database \ --language=java \ --command="./mvnw clean compile -q" \ --source-root=.
Basic CodeQL Concepts
Example 1: Simple CodeQL Query Structure
/** * @name Find empty catch blocks * @description Detects catch blocks that don't handle exceptions properly * @kind problem * @id java/empty-catch-block */ import java from CatchStmt catchStmt, BlockStmt block where block = catchStmt.getBlock() and block.getNumStmt() = 0 select catchStmt, "Empty catch block may silently ignore exceptions"
Example 2: Finding SQL Injection Vulnerabilities
/**
* @name SQL injection vulnerability
* @kind path-problem
*/
import java
import semmle.code.java.dataflow.DataFlow
import semmle.code.java.dataflow.TaintTracking
import DataFlow::PathGraph
class SqlInjectionConfig extends TaintTracking::Configuration {
SqlInjectionConfig() { this = "SqlInjectionConfig" }
override predicate isSource(DataFlow::Node source) {
exists(Method method |
method.getAParameter() = source.asParameter() and
method.hasName(["doGet", "doPost", "service"])
)
}
override predicate isSink(DataFlow::Node sink) {
exists(MethodAccess methodCall |
methodCall = sink.asExpr() and
methodCall.getMethod().getName().matches("execute%") and
methodCall.getMethod().getDeclaringType().hasName(["Statement", "PreparedStatement"])
)
}
}
from SqlInjectionConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "Potential SQL injection vulnerability"
Common Security Patterns in Java
Example 3: Command Injection Detection
/**
* @name Command injection
* @kind path-problem
*/
import java
import semmle.code.java.dataflow.DataFlow
import semmle.code.java.dataflow.TaintTracking
class CommandInjectionConfig extends TaintTracking::Configuration {
CommandInjectionConfig() { this = "CommandInjectionConfig" }
override predicate isSource(DataFlow::Node source) {
source.asParameter().getCallable() instanceof ServletMethod
}
override predicate isSink(DataFlow::Node sink) {
exists(MethodAccess exec |
exec = sink.asExpr() and
exec.getMethod().hasName("exec") and
exec.getMethod().getDeclaringType().hasName("Runtime")
)
}
}
from CommandInjectionConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "Potential command injection vulnerability"
Example 4: XSS Vulnerability Detection
/**
* @name Cross-site scripting (XSS) vulnerability
* @kind path-problem
*/
import java
import semmle.code.java.dataflow.DataFlow
import semmle.code.java.dataflow.TaintTracking
class XssConfig extends TaintTracking::Configuration {
XssConfig() { this = "XssConfig" }
override predicate isSource(DataFlow::Node source) {
exists(Parameter param |
source.asParameter() = param and
param.getCallable() instanceof ServletMethod
)
}
override predicate isSink(DataFlow::Node sink) {
exists(MethodAccess write |
write = sink.asExpr() and
write.getMethod().getName().matches("print%") and
write.getMethod().getDeclaringType().hasName(["PrintWriter", "JspWriter"])
)
}
override predicate isSanitizer(DataFlow::Node sanitizer) {
exists(MethodAccess escape |
escape = sanitizer.asExpr() and
escape.getMethod().hasName("escapeHtml") and
escape.getMethod().getDeclaringType().hasName("StringEscapeUtils")
)
}
}
from XssConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "Potential XSS vulnerability"
Advanced CodeQL Queries
Example 5: Hard-coded Credentials Detection
/**
* @name Hard-coded credentials
* @kind problem
*/
import java
from VariableAccess va, StringLiteral literal, Variable v
where
// Look for string literals assigned to variables
va.getVariable() = v and
literal = va.getRValue() and
// Variable name suggests it might be a credential
v.getName().matches("%password%") or
v.getName().matches("%secret%") or
v.getName().matches("%key%") or
v.getName().matches("%token%") and
// Exclude empty strings and common placeholder values
not literal.getValue().regexpMatch("^(|\\s*|null|default|test|example)$")
select literal, "Potential hard-coded credential: " + v.getName()
Example 6: Insecure Random Number Generation
/**
* @name Insecure random number generator
* @kind problem
*/
import java
from ClassInstanceExpr randomCreation
where
randomCreation.getConstructedType().hasQualifiedName("java.util", "Random") and
not exists(MethodAccess secureRandom |
secureRandom.getMethod().getName() = "getInstance" and
secureRandom.getMethod().getDeclaringType().hasQualifiedName("java.security", "SecureRandom") and
secureRandom.getEnclosingStmt() = randomCreation.getEnclosingStmt()
)
select randomCreation, "Insecure Random object creation. Use SecureRandom instead."
Example 7: Path Traversal Vulnerability
/**
* @name Path traversal vulnerability
* @kind path-problem
*/
import java
import semmle.code.java.dataflow.DataFlow
import semmle.code.java.dataflow.TaintTracking
class PathTraversalConfig extends TaintTracking::Configuration {
PathTraversalConfig() { this = "PathTraversalConfig" }
override predicate isSource(DataFlow::Node source) {
exists(Parameter param |
source.asParameter() = param and
param.getCallable() instanceof ServletMethod
)
}
override predicate isSink(DataFlow::Node sink) {
exists(MethodAccess fileCreation |
fileCreation = sink.asExpr() and
fileCreation.getMethod().getName().matches("%Constructor%") and
fileCreation.getMethod().getDeclaringType().hasQualifiedName("java.io", "File")
)
}
override predicate isSanitizer(DataFlow::Node sanitizer) {
exists(MethodAccess normalize |
normalize = sanitizer.asExpr() and
normalize.getMethod().getName().matches("normalize|getCanonicalPath") and
normalize.getMethod().getDeclaringType().hasQualifiedName("java.nio.file", "Paths")
)
}
}
from PathTraversalConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "Potential path traversal vulnerability"
Custom Security Rules
Example 8: Custom Security Rule - Unsafe Deserialization
/**
* @name Unsafe deserialization
* @kind path-problem
*/
import java
import semmle.code.java.dataflow.DataFlow
import semmle.code.java.dataflow.TaintTracking
class UnsafeDeserializationConfig extends TaintTracking::Configuration {
UnsafeDeserializationConfig() { this = "UnsafeDeserializationConfig" }
override predicate isSource(DataFlow::Node source) {
exists(Parameter param |
source.asParameter() = param and
param.getCallable() instanceof ServletMethod and
param.getType().hasName(["InputStream", "byte[]"])
)
}
override predicate isSink(DataFlow::Node sink) {
exists(MethodAccess deserialize |
deserialize = sink.asExpr() and
(
deserialize.getMethod().getName() = "readObject" and
deserialize.getMethod().getDeclaringType().hasQualifiedName("java.io", "ObjectInputStream") or
deserialize.getMethod().getName() = "readUnshared" and
deserialize.getMethod().getDeclaringType().hasQualifiedName("java.io", "ObjectInputStream") or
deserialize.getMethod().getName() = "deserialize" and
deserialize.getMethod().getDeclaringType().hasName("XMLDecoder")
)
)
}
}
from UnsafeDeserializationConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "Unsafe deserialization detected"
Example 9: Log Injection Detection
/**
* @name Log injection vulnerability
* @kind problem
*/
import java
from MethodAccess logCall, Expr logArgument
where
// Find logging method calls
logCall.getMethod().getName().matches("info|warn|error|debug|trace") and
logCall.getMethod().getDeclaringType().hasQualifiedName("org.slf4j", "Logger") and
// Get the first argument (usually the message)
logArgument = logCall.getArgument(0) and
// Check if argument contains user input patterns
exists(ParameterAccess userInput |
userInput.getParameter().getCallable() instanceof ServletMethod and
logArgument.getAChildExpr*() = userInput
)
select logCall, "Potential log injection vulnerability. User input used in log message without sanitization."
Integrating CodeQL into CI/CD
GitHub Actions Workflow
name: "CodeQL Security Scan"
on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main ]
schedule:
- cron: '0 0 * * 0' # Weekly scan
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
language: [ 'java' ]
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: ${{ matrix.language }}
queries: security-and-quality
- name: Build project
run: |
mvn clean compile -q
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
with:
category: "/language:${{matrix.language}}"
Custom CodeQL Workflow with Multiple Query Suites
name: "Advanced CodeQL Scan" on: push: branches: [ main ] pull_request: branches: [ main ] jobs: codeql-analysis: runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v3 - name: Setup Java uses: actions/setup-java@v3 with: java-version: '17' distribution: 'temurin' - name: Initialize CodeQL uses: github/codeql-action/init@v2 with: languages: java queries: | security-and-quality +security-extended +custom-queries/ config-file: ./.github/codeql/codeql-config.yml - name: Build project run: mvn clean compile -q -DskipTests - name: Perform CodeQL Analysis uses: github/codeql-action/analyze@v2
CodeQL Configuration File
codeql-config.yml
name: "Custom Security Configuration" queries: - uses: security-and-quality - uses: security-extended - uses: ./custom-queries/ paths: - src - lib paths-ignore: - generated - test - target query-filters: - exclude: id: java/inefficient-empty-string-test - exclude: problem.severity: warning
Custom Query Packs
Example 10: Creating Custom Query Packs
// custom-queries/MySecurityQueries.ql
/**
* @name Custom security rules for our application
* @kind problem
*/
import java
// Custom rule: Detect usage of deprecated crypto algorithms
from MethodAccess cryptoUsage
where
cryptoUsage.getMethod().getName().matches("getInstance") and
cryptoUsage.getMethod().getDeclaringType().hasQualifiedName("javax.crypto", "Cipher") and
exists(StringLiteral algo |
algo = cryptoUsage.getArgument(0) and
algo.getValue().matches("%DES%") or
algo.getValue().matches("%RC4%") or
algo.getValue().matches("%MD5%")
)
select cryptoUsage, "Use of deprecated cryptographic algorithm: " + cryptoUsage.getArgument(0).toString()
Running CodeQL Analysis
Command Line Usage
# Create database codeql database create java-app-db --language=java --command="mvn clean compile" # Run security queries codeql database analyze java-app-db \ codeql/java-queries:codeql-suites/java-security-extended.qls \ --format=sarif-latest \ --output=results.sarif # Run custom queries codeql database analyze java-app-db \ custom-queries/ \ --format=csv \ --output=results.csv # Generate CodeQL pack codeql pack create my-security-queries
Interpreting Results
SARIF Output Example
{
"runs": [
{
"results": [
{
"ruleId": "java/sql-injection",
"message": {
"text": "Potential SQL injection vulnerability"
},
"locations": [
{
"physicalLocation": {
"artifactLocation": {
"uri": "src/main/java/com/example/UserController.java"
},
"region": {
"startLine": 42,
"startColumn": 15,
"endLine": 42,
"endColumn": 45
}
}
}
]
}
]
}
]
}
Best Practices for CodeQL
- Start with Standard Suites: Begin with
security-and-qualityandsecurity-extended - Customize for Your Codebase: Create custom queries for domain-specific patterns
- Integrate Early: Run CodeQL in CI/CD pipelines
- Regular Updates: Keep CodeQL and query packs updated
- False Positive Management: Use query filters to exclude known false positives
- Team Education: Train developers on interpreting and fixing CodeQL findings
Conclusion
CodeQL provides powerful capabilities for Java security analysis:
Key Strengths:
- Comprehensive Analysis: Deep semantic understanding of code
- Customizable: Write domain-specific security rules
- Scalable: Handles large codebases efficiently
- Integratable: Fits into existing CI/CD pipelines
- Proactive: Finds vulnerabilities before they reach production
Common Java Security Issues Detected:
- SQL Injection
- Cross-Site Scripting (XSS)
- Command Injection
- Path Traversal
- Insecure Deserialization
- Hard-coded Credentials
- Weak Cryptography
Getting Started:
- Set up CodeQL CLI or GitHub Actions
- Run standard security queries
- Analyze results and fix critical issues
- Develop custom queries for your specific needs
- Integrate into development workflow
By leveraging CodeQL, development teams can systematically identify and remediate security vulnerabilities throughout the software development lifecycle, significantly improving application security posture.
Next Steps: Explore the CodeQL learning path, practice writing custom queries, and integrate CodeQL scanning into your development workflow for continuous security improvement.