Job Recommendation System - Mathematical Formulation
1. Core System Variables Definition
F = {f₁, f₂, ..., fₘ} # Set of all freelancers
J = {j₁, j₂, ..., jₙ} # Set of all active jobs
S(fᵢ) = [s₁, s₂, ..., sₖ] # Skill vector for freelancer fᵢ
D(jₖ) = title⨁desc⨁skills # Job document
T = {t₁, t₂, ..., tₚ} # Vocabulary
2. Term Frequency (TF) Formula
TF(t, d) = count(t ∈ d) / (∑ₜ, count(t' ∈ d))
Example:
Document: "PHP PHP JavaScript"
TF("PHP") = 2/3 = 0.667
TF("JavaScript") = 1/3 = 0.333
3. Inverse Document Frequency (IDF) Formula
IDF(t) = log(N / (1 + DF(t))) + 1
Where:
- N = total documents in corpus
- DF(t) = number of documents containing term t
Example:
If "PHP" appears in 100 of 1000 documents:
IDF("PHP") = log(1000/(1+100)) + 1 = log(1000/101) + 1 ≈ 2.29
4. TF-IDF Weight Calculation
TFIDF(t, d) = TF(t, d) × IDF(t)
Example:
TF("PHP", d) = 0.667
IDF("PHP") = 2.29
TFIDF("PHP", d) = 0.667 × 2.29 = 1.527
5. Document Vector Representation
V(d) = [TFIDF(t₁, d), TFIDF(t₂, d), ..., TFIDF(tₚ, d)]
Example vocabulary order: [PHP, JavaScript, MySQL, Python]
Document vector: [1.527, 0.873, 0, 0]
6. Dot Product Calculation
For vectors A = [a₁, a₂, …, aₚ] and B = [b₁, b₂, …, bₚ]
A·B = ∑ᵢ₌₁ᵖ (aᵢ × bᵢ)
Example:
A = [1.5, 0.8, 0, 0] B = [1.2, 0, 0.9, 0] A·B = (1.5×1.2) + (0.8×0) + (0×0.9) + (0×0) = 1.8
7. Vector Magnitude Calculation
‖A‖ = √(∑ᵢ₌₁ᵖ aᵢ²)
Example:
A = [1.5, 0.8, 0, 0] ‖A‖ = √(1.5² + 0.8² + 0² + 0²) = √(2.25 + 0.64) = √2.89 = 1.7
8. Cosine Similarity Formula
cosine(A, B) = (A·B) / (‖A‖ × ‖B‖)
Example:
A·B = 1.8 ‖A‖ = 1.7, ‖B‖ = 1.5 cosine(A, B) = 1.8 / (1.7 × 1.5) = 1.8 / 2.55 = 0.706
9. Job Recommendation Score
For freelancer fᵢ and job jₖ
R(fᵢ, jₖ) = cosine(V(S(fᵢ)), V(D(jₖ)))
Where:
- V(S(fᵢ)) = TFIDF vector of freelancer's skills
- V(D(jₖ)) = TFIDF vector of job requirements
10. Match Classification Rules
Based on cosine similarity score R
High Match: R > 0.7 # Strong skill overlap Medium Match: 0.4 ≤ R ≤ 0.7 # Partial match Low Match: R < 0.4 # Minimal or no skill match
11. Budget Compatibility Score
B(fᵢ, jₖ) = 1 - |budget_fᵢ - budget_jₖ| / max(budget_fᵢ, budget_jₖ)
Range: 0 to 1 (1 = perfect budget match)
12. Category Match Score
C(fᵢ, jₖ) = 1.0 if primary_category_fᵢ = category_jₖ 0.7 if secondary_category_fᵢ = category_jₖ 0.3 if similar_category_history 0.0 otherwise
13. Fallback Scoring Formula
Used when <3 jobs with R > 0.4
FallbackScore(jₖ) = 0.5×C(fᵢ, jₖ) + 0.3×B(fᵢ, jₖ) + 0.2×L(fᵢ, jₖ)
Where:
- C = category match score
- B = budget compatibility
- L = location match score (0.8 if same, 0.5 if same region, 0 otherwise)
14. Top-10 Selection Algorithm
1. Calculate scores for all active jobs
2. Sort in descending order
3. Select first 10 jobs
SortedJobs = argsortₖ(R(fᵢ, jₖ)) [descending]
Top10(fᵢ) = {j₍₁₎, j₍₂₎, ..., j₍₁₀₎}
15. Performance Metrics
Accuracy: 78% match rate in testing
Accuracy = TP / (TP + FP) = 0.78
Improvement over random (42%):
Improvement = (0.78 - 0.42) / 0.42 × 100% = 85.7%
Response time guarantee:
T_response = T_vectorize + T_compute + T_sort ≤ 3000ms
16. Complete Pipeline Formula
End-to-end mathematical representation
Recommendations(fᵢ) = Top₁₀( Sort( { cosine(V(S(fᵢ)), V(D(jₖ))) | ∀jₖ∈J } ) )
This is exactly equivalent to your PHP code:
function recommendJobs($freelancer_id) {
$profile = getFreelancerSkills($freelancer_id);
$jobs = getActiveJobs();
foreach($jobs as $job) {
$similarity = cosineSimilarity($profile_vector, $job_vector);
}
return top 10 jobs by similarity
}