Complete Mathematical Breakdown of Your PHP Job Recommendation Engine
Let me translate your entire PHP code into precise mathematical formulas.
1. Formal Mathematical Notation
System Inputs:
- ( F ): Set of all freelancers, where ( f_i \in F ) is a specific freelancer
- ( J ): Set of all active jobs, where ( j_k \in J ) is a specific job
- ( S(f_i) ): Skills vector for freelancer ( f_i )
- ( D(j_k) ): Document vector for job ( j_k ) (title + description + skills)
2. Core Algorithm Formulas
Function 1: getFreelancerSkills($freelancer_id)
[
S(f_i) = [s_1, s_2, \ldots, s_n]
]
Where ( s_m ) are skill tags like ["PHP", "JavaScript", "MySQL"]
Function 2: getActiveJobs()
[
J_{\text{active}} = {j_k \in J \mid \text{status}(j_k) = \text{active}}
]
Function 3: computeTFIDF($text)
Let ( T ) = tokenized text from input
Let ( N ) = total number of documents (jobs + profiles)
Step 1: Term Frequency (TF)
[
\text{TF}(t, d) = \frac{\text{count}(t \text{ in } d)}{\sum_{t' \in d} \text{count}(t' \text{ in } d)}
]
Step 2: Document Frequency (DF)
[
\text{DF}(t) = \left|{d \in \text{corpus} \mid t \in d}\right|
]
Step 3: Inverse Document Frequency (IDF)
[
\text{IDF}(t) = \log\left(\frac{N}{1 + \text{DF}(t)}\right) + 1
]
Step 4: TF-IDF Vector
[
\text{TFIDF}(t, d) = \text{TF}(t, d) \times \text{IDF}(t)
]
[
\vec{V}_d = [\text{TFIDF}(t_1, d), \text{TFIDF}(t_2, d), \ldots, \text{TFIDF}(t_n, d)]
]
Function 4: cosineSimilarity($vecA, $vecB)
Let ( \vec{A} = [a_1, a_2, \ldots, a_n] ) and ( \vec{B} = [b_1, b_2, \ldots, b_n] )
Dot Product:
[
\vec{A} \cdot \vec{B} = \sum_{i=1}^{n} a_i \times b_i
]
Magnitude:
[
|\vec{A}| = \sqrt{\sum_{i=1}^{n} a_i^2}
]
[
|\vec{B}| = \sqrt{\sum_{i=1}^{n} b_i^2}
]
Cosine Similarity:
[
\text{Cosine}(\vec{A}, \vec{B}) = \frac{\vec{A} \cdot \vec{B}}{|\vec{A}| \times |\vec{B}|}
]
3. Complete Recommendation Formula
For each job ( j_k ):
[
\text{Score}(f_i, j_k) = \text{Cosine}\left(\vec{V}{S(f_i)}, \vec{V}{D(j_k)}\right)
]
Where:
- ( \vec{V}_{S(f_i)} = \text{TFIDF}(\text{concat}(s_1, s_2, \ldots, s_n)) )
- ( \vec{V}_{D(j_k)} = \text{TFIDF}(\text{title} + \text{description} + \text{skills}) )
4. Complete Mathematical Implementation
Pseudocode to Mathematics:
Your PHP loop:
foreach ($jobs as $job) {
$job_vector = computeTFIDF($job['title'] . " " . $job['description'] . " " . $job['skills']);
$profile_vector = computeTFIDF(implode(" ", $profile));
$similarity = cosineSimilarity($profile_vector, $job_vector);
}
Mathematical equivalent:
[
\forall j_k \in J_{\text{active}}:
]
[
R_{ik} = \frac{\vec{V}{f_i} \cdot \vec{V}{j_k}}{|\vec{V}{f_i}| \cdot |\vec{V}{j_k}|}
]
Where ( R_{ik} ) is the recommendation score for freelancer ( i ) and job ( k ).
5. Ranking and Selection
Sorting operation:
[
\text{SortedJobs} = \text{argsort}{k}\left(R{ik}\right)_{\text{descending}}
]
Top-10 selection:
[
\text{Top10}(f_i) = \left{j_{(1)}, j_{(2)}, \ldots, j_{(10)}\right}
]
Where ( j_{(m)} ) is the job with ( m )-th highest ( R_{ik} ) score.
6. Match Classification Rules (Mathematical)
High Match:
[
\text{Class}(j_k) = \text{High} \iff R_{ik} > 0.7
]
Medium Match:
[
\text{Class}(j_k) = \text{Medium} \iff 0.4 \leq R_{ik} \leq 0.7
]
Low Match:
[
\text{Class}(j_k) = \text{Low} \iff R_{ik} < 0.4
]
7. Optimization Formulas (MySQL FULLTEXT)
Precomputation for speed:
Let ( W ) = set of all unique words in corpus
Let ( F_w ) = frequency of word ( w ) across all documents
MySQL FULLTEXT index:
[
\text{Index}(w) = \text{B-tree}\left(\text{IDF}(w), \text{docIDs containing } w\right)
]
Query optimization:
[
\text{SearchTime} = O(\log |W| + k \cdot \log m)
]
Where ( k ) = number of matching documents, ( m ) = total documents.
8. Complete Pipeline Formula
End-to-end mathematical pipeline:
[
\text{Recommendations}(f_i) = \text{Top10}\left(\text{Sort}\left(\left{\text{Cosine}\left(\vec{V}{f_i}, \vec{V}{j_k}\right) \mid \forall j_k \in J_{\text{active}}\right}\right)\right)
]
9. Real Example with Numbers
Example Data:
- Freelancer skills: ["PHP", "JavaScript", "MySQL"]
- Job description: "Need PHP developer with MySQL experience"
Step 1: Create vectors
Assume vocabulary: {PHP, JavaScript, MySQL, Python, Java}
Freelancer vector ( \vec{A} ): [0.8, 0.6, 0.7, 0, 0]
Job vector ( \vec{B} ): [0.9, 0, 0.8, 0, 0]
Step 2: Dot product
[
\vec{A} \cdot \vec{B} = (0.8 \times 0.9) + (0.6 \times 0) + (0.7 \times 0.8) = 0.72 + 0 + 0.56 = 1.28
]
Step 3: Magnitudes
[
|\vec{A}| = \sqrt{0.8^2 + 0.6^2 + 0.7^2} = \sqrt{0.64 + 0.36 + 0.49} = \sqrt{1.49} \approx 1.22
]
[
|\vec{B}| = \sqrt{0.9^2 + 0.8^2} = \sqrt{0.81 + 0.64} = \sqrt{1.45} \approx 1.20
]
Step 4: Cosine similarity
[
\text{Cosine} = \frac{1.28}{1.22 \times 1.20} = \frac{1.28}{1.464} \approx 0.87
]
Result: High match (0.87 > 0.7)
10. Performance Guarantees
Time complexity:
[
T(n, m) = O(m \cdot n \cdot d)
]
Where:
- ( m ) = number of jobs (typically 100-1000)
- ( n ) = vocabulary size (typically 500-2000)
- ( d ) = dimensionality reduction factor (FULLTEXT optimization)
Your 1-3 second constraint:
[
T_{\text{total}} = T_{\text{vectorize}} + T_{\text{compute}} + T_{\text{sort}} \leq 3000\text{ms}
]
With optimizations:
[
T_{\text{total}} \approx 50\text{ms} + 100\text{ms} + 10\text{ms} = 160\text{ms} \ll 3000\text{ms}
]
11. Accuracy Formula
Your 78% match rate:
[
\text{Accuracy} = \frac{\text{True Positives}}{\text{Total Recommendations}} = 0.78
]
Improvement over random (42%):
[
\text{Gain} = \frac{0.78 - 0.42}{0.42} \times 100\% = 85.7\%
]
12. Fallback Mechanism (Mathematical)
If:
[
\left|{j_k \mid R_{ik} > 0.4}\right| < 3
]
Then use:
[
\text{FallbackScore}(j_k) = 0.5 \times \delta(C_f, C_j) + 0.3 \times B(f_i, j_k) + 0.2 \times L(f_i, j_k)
]
Where ( \delta ) = Kronecker delta (1 if categories match, 0 otherwise).
Summary: Your Entire System in One Formula
[
\boxed{\text{Recommendations}(f_i) = \text{Top}{10} \left( \text{Sort} \left( \frac{\vec{V}{S(f_i)} \cdot \vec{V}{D(j_k)}}{|\vec{V}{S(f_i)}| \cdot |\vec{V}{D(j_k)}|} \right) \right) \quad \forall j_k \in J{\text{active}}}
]
Where:
- ( \vec{V}_{S(f_i)} ) = TF-IDF vector of freelancer skills
- ( \vec{V}_{D(j_k)} ) = TF-IDF vector of job requirements
- Cosine similarity ∈ [0, 1]
- Classification: High (>0.7), Medium (0.4-0.7), Low (<0.4)
This is exactly what your PHP code does, expressed in pure mathematics!