The importance of proper data infrastructure to avoid hiring biases
A dive into the various requirements companies must fulfill when setting out to use data.
A dive into the various requirements companies must fulfill when setting out to use data.
It is through the use of our perception that we gain access to the world. By way of sense organs, we access different dimensions of the world and as a result we are each left with our own, cohesive experience. AI on the other hand, must see the world through the lens of data.
Without sensory organs, we supply our AI with information by feeding it inputs, which is analogous to the world feeding us with all the various sights, sounds, tastes, etc. it has to offer. To that end, we can only expect accurate work from AI on the condition that the inputs correspond to the actual world.
This means that data must be properly parsed, be representative of that which is being represented, and be updated continuously to improve and finetune machine learning models in use.
Let’s now dive into the various requirements companies must fulfill when setting out to use data, and we will demonstrate how the data at nugget.ai is solid in its foundation and utilization.
Ontology is the study of being, and it is a way in which we can remove our own personal biases towards our perceptions. How is this the case? We consider the ontology of any particular object (e.g., the ontology of being the oldest sibling in a family) and we generate only the rules that capture the essence of what it would mean to be, e.g., the oldest sibling (more on this in the Logical Reasoning section). These rules are the kinds of qualities which are important to determine correctly and give us the proper tools to use and make decisions with.
Human behavior tends to generate additional qualities onto pieces of information; it is a way of “filling in the blanks” and it seems quite natural to do so. In some instances, new qualities are necessarily gained from one piece of information due to the structure of the information (e.g., If you know John has an older brother, then John cannot be the oldest child). However, we often find ourselves adding qualities which are improper and often illuminating of someone’s biases, shortcomings, or quirks. For example, someone inferring that an unmarried woman in her mid-forties must be a feminist, is a telling sign that person takes feminism to mean something other than what it actually means to be a feminist. Gender, age, and marital status all have no bearing on what it means to be a feminist, only your political values may have a determining factor on this matter.
This alone is a fascinating subject, and the exact way in which all stereotypes are formed. It is quite easy to generate a bigoted or prejudice sentiment merely by saying: “person x is member of G, therefore person x has quality F” where x is any arbitrary individual, G is any group being stereotyped, and F is a quality which you wish to stereotype group G with. This activity is quite ingrained in our way of thinking, and while it has its everyday uses, in the workplace it can often cause issues as we tend to place people into groups and claim they have a quality well before we know the person in question. We at nugget.ai actively fights these sentiments, as illustrated by the following use case:
An applicant applies to a firm for an entry level job. They have no work experience in an office, but they do have a master’s degree in hand. An HR worker knows these facts, and deems the applicant insufficient for the job, for lack of work experience means they would not be able to deal with the organizational structure of a company. Simply put, the applicant would just be a disruption.
This line of reasoning backs up experience requirements on job postings because screening is a necessary activity that must be done to keep an efficient pace when it comes to hiring. However, nugget can circumvent this line of reasoning while maintaining the utmost efficiency in screening: this is done by simply testing applicants on their competencies in organizational structure; the umbrella skill being system fluency (click here to learn more about how we measure soft skills like system fluency).
Return now to the applicant, who instead of being rejected based on a loose line of reasoning which may or may not hold true for this person; they are now able to prove themselves by taking the challenge. The HR worker, now armed with better information about this applicant, can make the right choice and hire based on the applicant’s competencies, or in other words the ontological merit of the person in question.
Through this example, we have demonstrated the various gifts of Ontological analysis:
Finally, we are able to talk about taxonomy’s position in all this. Taxonomy gives a workable database filled with important variables organized to be relevant to an ontological analysis. In nugget’s case, we have used the O*NET database extensively to determine which soft skills to test for, generating lists for occupations, determining proper skill alignment, and providing overall solutions to ontological problems. O*NET is a database which houses an extensive number of occupations complete with required skills, tasks, descriptions, and so on, giving nugget a wealth of data to work with. While no taxonomy is perfect or complete (and O*NET is no exception); human interaction with it leads to proper results where changes may be made in the process (e.g., in the accountant role, mathematics is a required skill, yet the definition includes not just arithmetic, algebra, and statistics, but also geometry which is markedly not required for accounting).
To provide justification, I will demonstrate how valid logical reasoning shows us we are in fact on the right track. Observe: (For those untrained in formal logic, look here, here, and here for basic rules/syntax, quantification syntax, and modal-specific rules, respectively)
I: Sound Ontological Argument
We utilize the fact that it is necessarily the case that if John is the oldest sibling, not one sibling will be older than John. However, we know John has an older sibling and so, John is not the oldest sibling. This argument works because it truly captures the essence of being the oldest sibling (premise 1). We can see a deceptively similar argument fail because it fails to capture the essence of the concept despite relying on a truthful premise: consider being the youngest sibling. Here’s a fact about being the youngest sibling: If John is the youngest sibling, he would have to have (at least one) older sibling. John does have an older sibling, so he must be the youngest. This fails to consider that John may be a middle child (that is, he has (at least) one older sibling and (at least) one younger sibling). We would have to say instead: If John is the youngest sibling, then he would have to have no younger sibling. Of course, none of our premises can tell us if John is in fact the youngest sibling, so we can only conclude he is not the oldest one. We can see how picking certain facts are not sufficient to determine ontological merit; we rather must do an ontological analysis and properly pick out the rules that leave no room for error.
II: Unsound Ontological Argument
The first premise assumes that all unmarried women in their forties are feminists. Holding such a premise is completely unfounded from an ontological point, and so it fails our test outright. Additionally, since there are unmarried women in their forties that aren’t feminists, the sentence cannot stand to be true- much less match the essence in question. Rather we must match qualities that track the essence, this is what would lead to the correct basis for a premise (e.g., for feminism it would be a collection of beliefs towards gender equality and so on).
So, we can form a table that demonstrates these ideas succinctly:
To cap off this blog, we will list off qualities a good data set will have, all qualities which will be conducive to the project of keeping AI effective, ethical, and smooth.
To learn more about Data Privacy, click here!