CodeQL: Advanced Security Analysis for Java Applications

CodeQL is GitHub's semantic code analysis engine that allows you to query code as if it were data. It enables security researchers and developers to find vulnerabilities, bugs, and other code issues through custom queries written in the QL language.

Table of Contents

What is CodeQL?

CodeQL treats code as a queryable database, allowing you to:

Find patterns across large codebases
Identify security vulnerabilities
Enforce coding standards
Discover bugs and anti-patterns

CodeQL Architecture

Source Code → Extractor → Database → QL Queries → Results
↑              ↑          ↑          ↑          ↑
Java/C++/   Language-specific  Binary   Security   Vulnerabilities
JavaScript   AST Generator    Format    Rules      & Issues

Setting Up CodeQL

Installation and Setup

# Install CodeQL CLI
gh extension install github/gh-codeql
# Or download directly
wget https://github.com/github/codeql-cli-binaries/releases/download/v2.14.6/codeql-linux64.zip
unzip codeql-linux64.zip
export PATH="$PATH:$(pwd)/codeql"
# Clone CodeQL repositories
codeql pack download codeql/java-queries
codeql pack download codeql/java-all

Creating a CodeQL Database

# For Java projects
codeql database create java-database \
--language=java \
--command="mvn clean compile -q" \
--source-root=.
# For Maven projects with wrapper
codeql database create java-database \
--language=java \
--command="./mvnw clean compile -q" \
--source-root=.

Basic CodeQL Concepts

Example 1: Simple CodeQL Query Structure

/**
* @name Find empty catch blocks
* @description Detects catch blocks that don't handle exceptions properly
* @kind problem
* @id java/empty-catch-block
*/
import java
from CatchStmt catchStmt, BlockStmt block
where
block = catchStmt.getBlock() and
block.getNumStmt() = 0
select catchStmt, "Empty catch block may silently ignore exceptions"

Example 2: Finding SQL Injection Vulnerabilities

/**
* @name SQL injection vulnerability
* @kind path-problem
*/
import java
import semmle.code.java.dataflow.DataFlow
import semmle.code.java.dataflow.TaintTracking
import DataFlow::PathGraph
class SqlInjectionConfig extends TaintTracking::Configuration {
SqlInjectionConfig() { this = "SqlInjectionConfig" }
override predicate isSource(DataFlow::Node source) {
exists(Method method | 
method.getAParameter() = source.asParameter() and
method.hasName(["doGet", "doPost", "service"])
)
}
override predicate isSink(DataFlow::Node sink) {
exists(MethodAccess methodCall |
methodCall = sink.asExpr() and
methodCall.getMethod().getName().matches("execute%") and
methodCall.getMethod().getDeclaringType().hasName(["Statement", "PreparedStatement"])
)
}
}
from SqlInjectionConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "Potential SQL injection vulnerability"

Common Security Patterns in Java

Example 3: Command Injection Detection

/**
* @name Command injection
* @kind path-problem
*/
import java
import semmle.code.java.dataflow.DataFlow
import semmle.code.java.dataflow.TaintTracking
class CommandInjectionConfig extends TaintTracking::Configuration {
CommandInjectionConfig() { this = "CommandInjectionConfig" }
override predicate isSource(DataFlow::Node source) {
source.asParameter().getCallable() instanceof ServletMethod
}
override predicate isSink(DataFlow::Node sink) {
exists(MethodAccess exec |
exec = sink.asExpr() and
exec.getMethod().hasName("exec") and
exec.getMethod().getDeclaringType().hasName("Runtime")
)
}
}
from CommandInjectionConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "Potential command injection vulnerability"

Example 4: XSS Vulnerability Detection

/**
* @name Cross-site scripting (XSS) vulnerability
* @kind path-problem
*/
import java
import semmle.code.java.dataflow.DataFlow
import semmle.code.java.dataflow.TaintTracking
class XssConfig extends TaintTracking::Configuration {
XssConfig() { this = "XssConfig" }
override predicate isSource(DataFlow::Node source) {
exists(Parameter param |
source.asParameter() = param and
param.getCallable() instanceof ServletMethod
)
}
override predicate isSink(DataFlow::Node sink) {
exists(MethodAccess write |
write = sink.asExpr() and
write.getMethod().getName().matches("print%") and
write.getMethod().getDeclaringType().hasName(["PrintWriter", "JspWriter"])
)
}
override predicate isSanitizer(DataFlow::Node sanitizer) {
exists(MethodAccess escape |
escape = sanitizer.asExpr() and
escape.getMethod().hasName("escapeHtml") and
escape.getMethod().getDeclaringType().hasName("StringEscapeUtils")
)
}
}
from XssConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "Potential XSS vulnerability"

Advanced CodeQL Queries

Example 5: Hard-coded Credentials Detection

/**
* @name Hard-coded credentials
* @kind problem
*/
import java
from VariableAccess va, StringLiteral literal, Variable v
where
// Look for string literals assigned to variables
va.getVariable() = v and
literal = va.getRValue() and
// Variable name suggests it might be a credential
v.getName().matches("%password%") or
v.getName().matches("%secret%") or
v.getName().matches("%key%") or
v.getName().matches("%token%") and
// Exclude empty strings and common placeholder values
not literal.getValue().regexpMatch("^(|\\s*|null|default|test|example)$")
select literal, "Potential hard-coded credential: " + v.getName()

Example 6: Insecure Random Number Generation

/**
* @name Insecure random number generator
* @kind problem
*/
import java
from ClassInstanceExpr randomCreation
where
randomCreation.getConstructedType().hasQualifiedName("java.util", "Random") and
not exists(MethodAccess secureRandom |
secureRandom.getMethod().getName() = "getInstance" and
secureRandom.getMethod().getDeclaringType().hasQualifiedName("java.security", "SecureRandom") and
secureRandom.getEnclosingStmt() = randomCreation.getEnclosingStmt()
)
select randomCreation, "Insecure Random object creation. Use SecureRandom instead."

Example 7: Path Traversal Vulnerability

/**
* @name Path traversal vulnerability
* @kind path-problem
*/
import java
import semmle.code.java.dataflow.DataFlow
import semmle.code.java.dataflow.TaintTracking
class PathTraversalConfig extends TaintTracking::Configuration {
PathTraversalConfig() { this = "PathTraversalConfig" }
override predicate isSource(DataFlow::Node source) {
exists(Parameter param |
source.asParameter() = param and
param.getCallable() instanceof ServletMethod
)
}
override predicate isSink(DataFlow::Node sink) {
exists(MethodAccess fileCreation |
fileCreation = sink.asExpr() and
fileCreation.getMethod().getName().matches("%Constructor%") and
fileCreation.getMethod().getDeclaringType().hasQualifiedName("java.io", "File")
)
}
override predicate isSanitizer(DataFlow::Node sanitizer) {
exists(MethodAccess normalize |
normalize = sanitizer.asExpr() and
normalize.getMethod().getName().matches("normalize|getCanonicalPath") and
normalize.getMethod().getDeclaringType().hasQualifiedName("java.nio.file", "Paths")
)
}
}
from PathTraversalConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "Potential path traversal vulnerability"

Custom Security Rules

Example 8: Custom Security Rule - Unsafe Deserialization

/**
* @name Unsafe deserialization
* @kind path-problem
*/
import java
import semmle.code.java.dataflow.DataFlow
import semmle.code.java.dataflow.TaintTracking
class UnsafeDeserializationConfig extends TaintTracking::Configuration {
UnsafeDeserializationConfig() { this = "UnsafeDeserializationConfig" }
override predicate isSource(DataFlow::Node source) {
exists(Parameter param |
source.asParameter() = param and
param.getCallable() instanceof ServletMethod and
param.getType().hasName(["InputStream", "byte[]"])
)
}
override predicate isSink(DataFlow::Node sink) {
exists(MethodAccess deserialize |
deserialize = sink.asExpr() and
(
deserialize.getMethod().getName() = "readObject" and
deserialize.getMethod().getDeclaringType().hasQualifiedName("java.io", "ObjectInputStream") or
deserialize.getMethod().getName() = "readUnshared" and
deserialize.getMethod().getDeclaringType().hasQualifiedName("java.io", "ObjectInputStream") or
deserialize.getMethod().getName() = "deserialize" and
deserialize.getMethod().getDeclaringType().hasName("XMLDecoder")
)
)
}
}
from UnsafeDeserializationConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "Unsafe deserialization detected"

Example 9: Log Injection Detection

/**
* @name Log injection vulnerability
* @kind problem
*/
import java
from MethodAccess logCall, Expr logArgument
where
// Find logging method calls
logCall.getMethod().getName().matches("info|warn|error|debug|trace") and
logCall.getMethod().getDeclaringType().hasQualifiedName("org.slf4j", "Logger") and
// Get the first argument (usually the message)
logArgument = logCall.getArgument(0) and
// Check if argument contains user input patterns
exists(ParameterAccess userInput |
userInput.getParameter().getCallable() instanceof ServletMethod and
logArgument.getAChildExpr*() = userInput
)
select logCall, "Potential log injection vulnerability. User input used in log message without sanitization."

Integrating CodeQL into CI/CD

GitHub Actions Workflow

name: "CodeQL Security Scan"
on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main ]
schedule:
- cron: '0 0 * * 0'  # Weekly scan
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
language: [ 'java' ]
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: ${{ matrix.language }}
queries: security-and-quality
- name: Build project
run: |
mvn clean compile -q
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
with:
category: "/language:${{matrix.language}}"

Custom CodeQL Workflow with Multiple Query Suites

name: "Advanced CodeQL Scan"
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
jobs:
codeql-analysis:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Setup Java
uses: actions/setup-java@v3
with:
java-version: '17'
distribution: 'temurin'
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: java
queries: |
security-and-quality
+security-extended
+custom-queries/
config-file: ./.github/codeql/codeql-config.yml
- name: Build project
run: mvn clean compile -q -DskipTests
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2

CodeQL Configuration File

codeql-config.yml

name: "Custom Security Configuration"
queries:
- uses: security-and-quality
- uses: security-extended
- uses: ./custom-queries/
paths:
- src
- lib
paths-ignore:
- generated
- test
- target
query-filters:
- exclude:
id: java/inefficient-empty-string-test
- exclude:
problem.severity: warning

Custom Query Packs

Example 10: Creating Custom Query Packs

// custom-queries/MySecurityQueries.ql
/**
* @name Custom security rules for our application
* @kind problem
*/
import java
// Custom rule: Detect usage of deprecated crypto algorithms
from MethodAccess cryptoUsage
where
cryptoUsage.getMethod().getName().matches("getInstance") and
cryptoUsage.getMethod().getDeclaringType().hasQualifiedName("javax.crypto", "Cipher") and
exists(StringLiteral algo |
algo = cryptoUsage.getArgument(0) and
algo.getValue().matches("%DES%") or
algo.getValue().matches("%RC4%") or
algo.getValue().matches("%MD5%")
)
select cryptoUsage, "Use of deprecated cryptographic algorithm: " + cryptoUsage.getArgument(0).toString()

Running CodeQL Analysis

Command Line Usage

# Create database
codeql database create java-app-db --language=java --command="mvn clean compile"
# Run security queries
codeql database analyze java-app-db \
codeql/java-queries:codeql-suites/java-security-extended.qls \
--format=sarif-latest \
--output=results.sarif
# Run custom queries
codeql database analyze java-app-db \
custom-queries/ \
--format=csv \
--output=results.csv
# Generate CodeQL pack
codeql pack create my-security-queries

Interpreting Results

SARIF Output Example

{
"runs": [
{
"results": [
{
"ruleId": "java/sql-injection",
"message": {
"text": "Potential SQL injection vulnerability"
},
"locations": [
{
"physicalLocation": {
"artifactLocation": {
"uri": "src/main/java/com/example/UserController.java"
},
"region": {
"startLine": 42,
"startColumn": 15,
"endLine": 42,
"endColumn": 45
}
}
}
]
}
]
}
]
}

Best Practices for CodeQL

Start with Standard Suites: Begin with security-and-quality and security-extended
Customize for Your Codebase: Create custom queries for domain-specific patterns
Integrate Early: Run CodeQL in CI/CD pipelines
Regular Updates: Keep CodeQL and query packs updated
False Positive Management: Use query filters to exclude known false positives
Team Education: Train developers on interpreting and fixing CodeQL findings

Conclusion

CodeQL provides powerful capabilities for Java security analysis:

Key Strengths:

Comprehensive Analysis: Deep semantic understanding of code
Customizable: Write domain-specific security rules
Scalable: Handles large codebases efficiently
Integratable: Fits into existing CI/CD pipelines
Proactive: Finds vulnerabilities before they reach production

Common Java Security Issues Detected:

SQL Injection
Cross-Site Scripting (XSS)
Command Injection
Path Traversal
Insecure Deserialization
Hard-coded Credentials
Weak Cryptography

Getting Started:

Set up CodeQL CLI or GitHub Actions
Run standard security queries
Analyze results and fix critical issues
Develop custom queries for your specific needs
Integrate into development workflow

By leveraging CodeQL, development teams can systematically identify and remediate security vulnerabilities throughout the software development lifecycle, significantly improving application security posture.

Next Steps: Explore the CodeQL learning path, practice writing custom queries, and integrate CodeQL scanning into your development workflow for continuous security improvement.