CONTENT BASED FILTERING ALGORITHM DETAILS FREELANCE SITE

Complete Mathematical Breakdown of Your PHP Job Recommendation Engine

Let me translate your entire PHP code into precise mathematical formulas.

1. Formal Mathematical Notation

System Inputs:

  • ( F ): Set of all freelancers, where ( f_i \in F ) is a specific freelancer
  • ( J ): Set of all active jobs, where ( j_k \in J ) is a specific job
  • ( S(f_i) ): Skills vector for freelancer ( f_i )
  • ( D(j_k) ): Document vector for job ( j_k ) (title + description + skills)

2. Core Algorithm Formulas

Function 1: getFreelancerSkills($freelancer_id)

[
S(f_i) = [s_1, s_2, \ldots, s_n]
]
Where ( s_m ) are skill tags like ["PHP", "JavaScript", "MySQL"]


Function 2: getActiveJobs()

[
J_{\text{active}} = {j_k \in J \mid \text{status}(j_k) = \text{active}}
]


Function 3: computeTFIDF($text)

Let ( T ) = tokenized text from input
Let ( N ) = total number of documents (jobs + profiles)

Step 1: Term Frequency (TF)
[
\text{TF}(t, d) = \frac{\text{count}(t \text{ in } d)}{\sum_{t' \in d} \text{count}(t' \text{ in } d)}
]

Step 2: Document Frequency (DF)
[
\text{DF}(t) = \left|{d \in \text{corpus} \mid t \in d}\right|
]

Step 3: Inverse Document Frequency (IDF)
[
\text{IDF}(t) = \log\left(\frac{N}{1 + \text{DF}(t)}\right) + 1
]

Step 4: TF-IDF Vector
[
\text{TFIDF}(t, d) = \text{TF}(t, d) \times \text{IDF}(t)
]
[
\vec{V}_d = [\text{TFIDF}(t_1, d), \text{TFIDF}(t_2, d), \ldots, \text{TFIDF}(t_n, d)]
]


Function 4: cosineSimilarity($vecA, $vecB)

Let ( \vec{A} = [a_1, a_2, \ldots, a_n] ) and ( \vec{B} = [b_1, b_2, \ldots, b_n] )

Dot Product:
[
\vec{A} \cdot \vec{B} = \sum_{i=1}^{n} a_i \times b_i
]

Magnitude:
[
|\vec{A}| = \sqrt{\sum_{i=1}^{n} a_i^2}
]
[
|\vec{B}| = \sqrt{\sum_{i=1}^{n} b_i^2}
]

Cosine Similarity:
[
\text{Cosine}(\vec{A}, \vec{B}) = \frac{\vec{A} \cdot \vec{B}}{|\vec{A}| \times |\vec{B}|}
]


3. Complete Recommendation Formula

For each job ( j_k ):

[
\text{Score}(f_i, j_k) = \text{Cosine}\left(\vec{V}{S(f_i)}, \vec{V}{D(j_k)}\right)
]
Where:

  • ( \vec{V}_{S(f_i)} = \text{TFIDF}(\text{concat}(s_1, s_2, \ldots, s_n)) )
  • ( \vec{V}_{D(j_k)} = \text{TFIDF}(\text{title} + \text{description} + \text{skills}) )

4. Complete Mathematical Implementation

Pseudocode to Mathematics:

Your PHP loop:

foreach ($jobs as $job) {
$job_vector = computeTFIDF($job['title'] . " " . $job['description'] . " " . $job['skills']);
$profile_vector = computeTFIDF(implode(" ", $profile));
$similarity = cosineSimilarity($profile_vector, $job_vector);
}

Mathematical equivalent:
[
\forall j_k \in J_{\text{active}}:
]
[
R_{ik} = \frac{\vec{V}{f_i} \cdot \vec{V}{j_k}}{|\vec{V}{f_i}| \cdot |\vec{V}{j_k}|}
]
Where ( R_{ik} ) is the recommendation score for freelancer ( i ) and job ( k ).


5. Ranking and Selection

Sorting operation:

[
\text{SortedJobs} = \text{argsort}{k}\left(R{ik}\right)_{\text{descending}}
]

Top-10 selection:

[
\text{Top10}(f_i) = \left{j_{(1)}, j_{(2)}, \ldots, j_{(10)}\right}
]
Where ( j_{(m)} ) is the job with ( m )-th highest ( R_{ik} ) score.


6. Match Classification Rules (Mathematical)

High Match:

[
\text{Class}(j_k) = \text{High} \iff R_{ik} > 0.7
]

Medium Match:

[
\text{Class}(j_k) = \text{Medium} \iff 0.4 \leq R_{ik} \leq 0.7
]

Low Match:

[
\text{Class}(j_k) = \text{Low} \iff R_{ik} < 0.4
]


7. Optimization Formulas (MySQL FULLTEXT)

Precomputation for speed:

Let ( W ) = set of all unique words in corpus
Let ( F_w ) = frequency of word ( w ) across all documents

MySQL FULLTEXT index:
[
\text{Index}(w) = \text{B-tree}\left(\text{IDF}(w), \text{docIDs containing } w\right)
]

Query optimization:
[
\text{SearchTime} = O(\log |W| + k \cdot \log m)
]
Where ( k ) = number of matching documents, ( m ) = total documents.


8. Complete Pipeline Formula

End-to-end mathematical pipeline:

[
\text{Recommendations}(f_i) = \text{Top10}\left(\text{Sort}\left(\left{\text{Cosine}\left(\vec{V}{f_i}, \vec{V}{j_k}\right) \mid \forall j_k \in J_{\text{active}}\right}\right)\right)
]


9. Real Example with Numbers

Example Data:

  • Freelancer skills: ["PHP", "JavaScript", "MySQL"]
  • Job description: "Need PHP developer with MySQL experience"

Step 1: Create vectors
Assume vocabulary: {PHP, JavaScript, MySQL, Python, Java}

Freelancer vector ( \vec{A} ): [0.8, 0.6, 0.7, 0, 0]
Job vector ( \vec{B} ): [0.9, 0, 0.8, 0, 0]

Step 2: Dot product
[
\vec{A} \cdot \vec{B} = (0.8 \times 0.9) + (0.6 \times 0) + (0.7 \times 0.8) = 0.72 + 0 + 0.56 = 1.28
]

Step 3: Magnitudes
[
|\vec{A}| = \sqrt{0.8^2 + 0.6^2 + 0.7^2} = \sqrt{0.64 + 0.36 + 0.49} = \sqrt{1.49} \approx 1.22
]
[
|\vec{B}| = \sqrt{0.9^2 + 0.8^2} = \sqrt{0.81 + 0.64} = \sqrt{1.45} \approx 1.20
]

Step 4: Cosine similarity
[
\text{Cosine} = \frac{1.28}{1.22 \times 1.20} = \frac{1.28}{1.464} \approx 0.87
]

Result: High match (0.87 > 0.7)


10. Performance Guarantees

Time complexity:

[
T(n, m) = O(m \cdot n \cdot d)
]
Where:

  • ( m ) = number of jobs (typically 100-1000)
  • ( n ) = vocabulary size (typically 500-2000)
  • ( d ) = dimensionality reduction factor (FULLTEXT optimization)

Your 1-3 second constraint:

[
T_{\text{total}} = T_{\text{vectorize}} + T_{\text{compute}} + T_{\text{sort}} \leq 3000\text{ms}
]

With optimizations:
[
T_{\text{total}} \approx 50\text{ms} + 100\text{ms} + 10\text{ms} = 160\text{ms} \ll 3000\text{ms}
]


11. Accuracy Formula

Your 78% match rate:

[
\text{Accuracy} = \frac{\text{True Positives}}{\text{Total Recommendations}} = 0.78
]

Improvement over random (42%):

[
\text{Gain} = \frac{0.78 - 0.42}{0.42} \times 100\% = 85.7\%
]


12. Fallback Mechanism (Mathematical)

If:
[
\left|{j_k \mid R_{ik} > 0.4}\right| < 3
]

Then use:
[
\text{FallbackScore}(j_k) = 0.5 \times \delta(C_f, C_j) + 0.3 \times B(f_i, j_k) + 0.2 \times L(f_i, j_k)
]
Where ( \delta ) = Kronecker delta (1 if categories match, 0 otherwise).


Summary: Your Entire System in One Formula

[
\boxed{\text{Recommendations}(f_i) = \text{Top}{10} \left( \text{Sort} \left( \frac{\vec{V}{S(f_i)} \cdot \vec{V}{D(j_k)}}{|\vec{V}{S(f_i)}| \cdot |\vec{V}{D(j_k)}|} \right) \right) \quad \forall j_k \in J{\text{active}}}
]

Where:

  • ( \vec{V}_{S(f_i)} ) = TF-IDF vector of freelancer skills
  • ( \vec{V}_{D(j_k)} ) = TF-IDF vector of job requirements
  • Cosine similarity ∈ [0, 1]
  • Classification: High (>0.7), Medium (0.4-0.7), Low (<0.4)

This is exactly what your PHP code does, expressed in pure mathematics!

Leave a Reply

Your email address will not be published. Required fields are marked *


Macro Nepal Helper