
Sage: Enabling Aging Workers to Excel in the Modern Job Market

A machine learning & natural language processing browser plug-in that enables aging workers to take control of their futures.

In this project, I built a BERT-based NLP model designed to detect age-biased terms targeting older job seekers within job postings. The primary objective behind flagging ageism terms in job posts is to provide older job seekers an understanding that it is the job postings that are problematic rather than their age. This awareness can help older job seekers navigate the job market more confidently and advocate for fair treatment based on their qualifications and skills rather than arbitrary age-related biases.


As the job market becomes increasingly digital, job websites have become a critical tool for job seekers of all ages. However, many of these websites are designed with a narrow focus on younger job seekers, ignoring the needs and preferences of older users. This lack of age inclusivity can create significant barriers for older job seekers, who may encounter ageist language and assumptions in job postings. Employers often use language that implies a preference for younger candidates or even excludes older workers entirely. This practice is not only unfair but also illegal, yet it persists on job sites across the internet.

In this project, we explored two prototypes aimed at promoting age inclusivity on job websites: (1) improving website design to better meet the needs of older users, and (2) mitigating the impact of ageist language in job descriptions, which is also the focus of my work in this project.



knowing our space

My team included two designers, a PM, a cybersecurity lead, and a data scientist (me). We worked alongside Annalee Saxenian, a professor at the Berkeley I School. She helped us with our product problem scoping and subject matter expertise, respectively. Additionally, we conducted 26 interviews and 6 usability testings in the recruiting space in order to develop our solution.


How it’s done

Our model works across different job search platforms to flag ageist language across personality, lifestyle, preferences, skills and most importantly, the age factor.

  • Show the age-friendliness status of the job posting
  • Calls out ageist language and the intensity of ageism
  • Works concurrently with the age-friendly design features of the plugin

Defining Ageism in the Workforce

  • Interview & Literature Review

    To build a model that detects ageist language, we first need to clearly define ageism in the workforce. We conducted extensive literature research and 26 qualitative interviews to summarize the various types of ageist language that may occur in job descriptions.

    Type of discrimination Example usage
    Personality • You are a team player with an absolute winning mentality
    • Looking for an energetic, ambitious self starter who’s not afraid to question the status quo
    Physical capability • Must be willing to sit continuously for 8+ hours a day
    Lifestyle interests • You’re a digital native with a keen interest in pop culture
    Skill sets • Must be tech savvy and a digital native
    • Deliver results under tight deadlines with limited oversight
    Age • Excellent degree from a top university with strong A- 260 levels (ideal 300+ UCAS points)

  • Data sources & Annotations

    We sampled approximately 200 job ads from the Employment Scam Aegean Dataset (EMSCAD) & 10,000 data science US job postings from With an inter annotator agreement of 0.896, we annotated over 800 lines of text from various job descriptions and requirements in these datasets to create our traning dataset. Among them, 60% of the data were used as the training set, 20% as the validation set, and the rest of 20% were used to test the model performance.


Digging into Model Details

  • Model structure

    Using the human-annotated labels, I trained a classifier that consists of a BERT- based representation network, an attention layer, one hidden layer, and a softmax layer. The job descriptions are first sentence-tokenized using SpaCy, with special characters removed. The entire sentence was then loaded into BERT. The “bert-base-uncased” pre-trained BERT model was used to obtain embeddings for all sentences of job description with their contexts. The features are then passed to a Logistic regression classifier to perform binary classification.

    A decision threshold is used to convert the logistics output by the model to perform binary classifications. The threshold is chosen based on the precision-recall curve below. I want to ensure a higher precision rate while not sacrificing the recall rate too much, and so a decision threshold of 0.9995 is ended up being chosen.

  • Model performance

    Given that the data has highly imbalanced labels. I chose the precision, recall, and f1-score to evaluate the success of our model. The classifiers achieved an f1-score of 0.8 for overall classification, 0.86 for non-ageism sentences, and 0.69 for ageist sentences on the test set.

    image image


  • Vague definitions of age discriminatory

    When my colleagues and I were annotating the data, we spent a lot of time discussing whether using certain personality traits that stereotypically describe younger people (e.g., passionate, highly motivated, creative, and innovative) could be considered discriminatory. While it doesn’t imply that older people can’t possess these traits, we discovered that the excessive use of such terms may imply age discrimination. However, the distinction between ageism or not still depends on the individual perspective.

  • Limitations on data sources

    Currently, the model is trained on a static dataset annotated by two students. Therefore, any potential differences in the perception of discrimination by real-world users, as compared to what the model learned from the student researcher annotations, would not be captured at present. To address this and correct for any bias introduced by third-party annotators, the next step involves implementing a user feedback loop. This dynamic approach to training the model will be based on real user feedback, reducing dependence on bulk annotation by external annotators and minimizing bias.

  • Using other types of model

    The current model in use focuses on sentence-level classification, limiting its ability to capture the broader context of entire paragraphs or job descriptions. Utilizing models such as SBERT could potentially enhance its capability to detect implicit ageism present in job advertisements. However, the project encountered challenges in implementing more intricate models due to limitations imposed by restricted computational resources. With increased computational resources in the future, leveraging more extensive datasets and employing more intricate models, such as fine-tuning with a larger batch size, would become possible. Additionally, employing multi-classification models instead of binary classification models may allow for the inclusion of diverse levels of ageism and, to some extent, address the mentioned ambiguity.


