System Design and Self-Selection Bias

October 13, 2018June 26, 2021 / Will Beason / Leave a comment

Not all populations are created equal. Blindly designing a system without thinking about the pressures involved in the data you collect (or the people who will participate) can easily result in harm to society.

As an example, in a public online polls respondents are more likely to have strong opinions. Potential respondents who don’t have firm positions are less likely to see value in providing answers, and will be less likely to put effort into it. Drawing conclusions from an online poll anyone can respond to will incorrectly lead to the finding that people are very polarized on issues. Scientific polls have safeguards to prevent this sort of bias.

Self-selection bias in system design isn’t always obvious, so I want to discuss a more nuanced case.

There’s a trend in the US for developing new financial instruments. Markets in the US are not well regulated, so it is very easy for entrepreneurs to develop new types of financial contracts as means of making money. One such case is instruments mimicking reverse mortgages. For example, Point lets homeowners sell a percentage of their home in return for cash.

Homes and Liquidity

In economics, liquid assets refers to things that people can exchange without the item losing value. In practice, this means anyone can easily determine the item’s value and can exchange it easily. Cash is a liquid asset because its value is literally printed on the bill or coin and nearly anyone will accept it in exchange for goods immediately. The opposite, illiquid assets, refers to items for which this is difficult. For example, exchanging a home can take weeks and it takes a professional hours to determine a fair value.

For Point, its selling point is the option for homeowners to exchange a portion of their illiquid assets – ownership of their home – in exchange for liquid assets – cash. Fortunately Point is honest that they may offer less for the portion of the home that they buy. They may only offer $90,000 for 20% of a home which has been appraised at $500,000. In these agreements the homeowner’s net worth immediately decreases significantly – possibly by tens of thousands of dollars. (Just think of what would happen if the owner sold to Point, then immediately bought back that 20% in the case above.)

Self-Selection Pressures

In designing any system, we have to consider the pressures which will influence the statistical properties of the decisions it makes.

Aside: Homo Economicus and Homo Psychologicus

Homo economicus is a simplification of humans for economic modeling. It presents a human-like agent that acts rationally in its own self interest according to available information.

Homo psychologicus is a perturbation of homo economicus that takes into accounts the psychological factors all humans fall prey to. These models are more complex since they require more variables, but should be taken into account in situations where humans are less likely to act in their rational self interest.

Homeowner Selection

In this case, we must first ask: What sort of homeowner is likely to accept an offer from Point?

This is where self-selection bias comes into play. In the long term, it’s obvious that statistically the better option is to hold onto full ownership. (If it were not, Point would not have a business model.) So if we assume an owner has a base level of financial savviness, they will hold on to full ownership if they have the financial means to do so. Thus, we can expect many participating homeowners to feel some pressure to get cash quickly. We can then assume they are less likely to be financially stable – more likely to have a credit payment or a bill they need to pay off quickly. They are not reducible to homo economicus but must be modeled as homo psychologicus. They will be more prone to mistakes in reasoning and more impacted by biases.

Offer Selection

Next: What sorts of offers is Point likely to make?

This is a second source of selection pressure – it impacts the statistics of the offers Point is likely to make. Success for Point’s valuation algorithm is the money Point makes, so if functioning optimally the algorithm will offer the lowest amount the owner will agree to. As a business Point is likely to be functioning close to homo economicus, so we can assume they will make this rational decision.

Aside: Weapon of Math Destruction

The logic which goes into this offer cannot be appealed, and there is no way for the owner to know how the value was calculated – it is hidden behind an unquestionable proprietary algorithm. This algorithm is unlikely to be “fair” to the homeowner – its purpose is to make money for Point. Simply by measuring success this way and not being auditable, the algorithm will be predatory (even if no humans involved had this intention). This unaccountability makes it a Weapon of Math Destruction.

Agreement Selection

Combining these two selection pressures lets us answer: What sort of agreements are likely to be made and accepted?

When dealing with an owner who is not acting rationally, Point can make offers below what a rational homeowner would accept.

Point is most likely to enter into agreements with owners who have undervalued their own home. The greater the disparity between an owner’s valuation and what Point really thinks the value is, the more incentive Point has to make a deal. If the value Point offers is sufficiently lower than what a homeowner thinks it is, a homeowner will always reject the offer. If the owner does not have a reasonable understanding of their home’s value, they are more likely to think the offer is a good one. Additionally, if the owner’s rationality is compromised they are more likely to enter into a deal that is not in their best interests. For deals between two homo economicus we would expect deals only to be made when both parties rationally perceive they will profit. Since it is likely many owners will not be acting purely rationally due to other factors in their life, we can’t make this assumption. There will be owners who enter deals which hurt them.

The Sunk Cost Effect

By the time owners get to this stage, they have spent several hundred dollars in getting their home appraised and hours of their time. Point at most has wasted some of its employees’ time. It’s worth it to Point as they absorb some of this cost by offering other homeowners lower prices. But for the owner, they now have an appraisal that they wouldn’t otherwise have. For many the time, money, and effort they already expended will make them more likely to accept the offer even if it is below what they are comfortable with – this is the sunk cost effect. Since many potential participants already had some pressure to get money quickly, this has made their situation more dire and they are less likely to act rationally.

Conclusion

Should you use Point? That depends on your measure of success. If success is making a good return on investment, that depends on whether you can use the increased liquidity to make more than what you (effectively) paid Point and it is better than similar financial arrangements (e.g. reverse mortgage). If you are against systems that have the potential to harm society, you have to decide whether you trust Point has accounted for the damage it could do.

Point has the capacity to harm society if it isn’t careful. We can’t measure the impact on society since its valuation algorithms are hidden and Point is unlikely to share its data with researchers. Even if it cost them nothing to find out, it is likely they would choose not to know¹ whether their system caused harm. They would also be unlikely to share if they did know.

By default, the homeowners who self-select to enter agreements with Point will not be in a good financial situation, and so will generally lose net worth in the agreement. We can expect Point’s algorithm will move net worth from people with lower wealth to people with higher wealth – exacerbating the current wealth inequality problems our society faces. While it is possible for individuals to recoup the loss they incurred by purchasing the fast liquidity, this is not the default. If a homeowner has need of liquidity urgently, they aren’t likely to be using it in a way that will gain value (like investments) – they are more likely to need it for a large high-interest debt or unexpected bill.

It is important to consider these factors in any system being designed. It is possible Point has mitigated the issues I’ve described. This would require them to have a drive to actively ensure they aren’t being predatory of people in tough financial spots. This isn’t something done passively, but a professional responsibility they would have to choose to take. In the absence of evidence or any insight into their transactions and offer methodology, we simply can’t know.

*Economics for the Common Good*, by Jean Tirole, 2017, pp. 131-132 ↩

Overnormalization

October 1, 2018June 26, 2021 / Will Beason / Leave a comment

Why is it hard for computers to understand language?

This question plagues many a developer of NLP (Natural Language Processing) systems. While there are certain aspects of language we don’t know how to process yet, often we oversimplify language to make it easier for computers at the expense of maintaining the meaning. This isn’t a problem with processing power, but a conceptual limitation of humans designing these machines. In trying to understand how to deal with language, there’s many common mistakes that plague systems that try to interpret English. These cause the sorts of problems that make the users of these systems think computers will never really be able to interact on a human level.

An Aside: Normalization

Before processing text, most NLP systems normalize the text in some way to make it easier for the computer to understand. This may include steps like correcting obvious spelling mistakes, lowercasing all letters, and removing superfluous spacing. The idea is that none of these modifications really changes the meaning of the text, and there’s no need to develop a machine that can (and thus, has to) learn that when a sentence has extra spaces in the middle that it rarely means anything interesting.

Overnormalization: Ignoring Capitalization

Making all text lowercase before processing makes sense. For example, there’s obviously not a significantly big difference between For (as it would appear at the start of a sentence) and for in the middle. By default, a machine would treat For and for as completely different words and not know that they are very related. By lowercasing all text, we eliminate this class of mistake. It also nearly halves the number of words the machine has to learn. This can be a massive help since most words rarely appear capitalized, and if the machine sees the capitalized form of the word for the first time in the wild (rather than in training), it will immediately connect it to the word it already understands. There might simply not be enough training data for a computer to learn that Lanthanide and lanthanide are the same word just from context. However, this simplification has unintended side effects.

Consider these three sentences and their intended meaning.

I saw it. = I saw [something referenced in another sentence].

I saw It. = I saw [the film It].

I saw IT. = I saw [the Information Technology department].

Any system that blindly lowercases everything will treat these sentences identically and appear comically inept. They have three completely different meanings and as humans we can see the distinction immediately. What’s worse is when low-level machine learning models are trained on them, and they are used to feed more sophisticated models.

A technique we use to teach computers relationships between words is word embedding, which is a type of model that can be thought of as a map containing every word: words closer to each other have “more similar” meanings than those far apart. The model “learns” from a bunch of sample sentences we feed it – usually hundreds of thousands. In this case, by lowercasing everything we told the machine that it, It, and IT all mean the same thing – they have the “same location”. This not only corrupts the computer’s understanding of those three words, but anything even tangentially related to them. Words related to IT like administrator and support will be incorrectly be considered similar to ones near it such as that and this. Now if that faulty word embedding is used to train an even more complex model, it will compound the problems. Consider that there are literally hundreds of examples where capitalization matters in English, and there will be many bits of language the computer will have trouble understanding.

Solution: Variable Granularity

We appear to have competing requirements:

We want our machine to ignore differences in capitalization when they don’t matter.
We want our machine to pay attention to differences in capitalization when they do matter.

I suggest adding a third requirement, one that suggests a solution.

We don’t want to have to tell our machine each case where capitalization matters.

We could literally enumerate all instances where people capitalized words in a non-standard way, but that isn’t practical and the system won’t automatically figure out new instances. If we could automatically detect when capitalization mattered, then the first two requirements become non-issues.

A word embedding needs about one hundred example word usages to “learn” what a word means and use it as an anchor point to understand similar words. While educated may appear in many hundreds of sentences, erudite may appear in only a few, but it will appear in contexts similar enough to educated that the machine will figure out that the words are very similar. We can leverage this limitation – by declaring that if a word appears fewer than 100 times then we accept that the machine will sometimes make mistakes with those words.

We can turn this threshold into a rule which determines whether to create an entry for the word’s word embedding:

If the exact capitalization occurs 100 or more times, make an entry for it.
If the exact capitalization occurs fewer than 100 times, use the entry for the most common capitalization (or create one if it does not exist)

Word	Appearances	Entry
rest	5	rest
REST	3	rest
Rest	1	rest
reST	1	rest

Even though most of these do have significantly different meanings, there wouldn’t be enough information for the computer to figure out the difference. Now suppose we collect significantly more data.

Word	Appearances	Entry
rest	500	rest
REST	200	REST
Rest	10	rest
reST	5	rest

Capitalization

There is now ample data for the machine to see that rest and REST are used very differently. Both should have their own entries on the word embedding. Until “Rest” and reST have enough training examples, they will be grouped under a default – probably rest as it is the most common. While this correctly labels Rest as identical to rest, it still incorrectly groups reST with them.

This method may initially seem to fail for highly common words:

Word	Appearances	Entry
the	10,000	the
The	1,000	The

In this case, the logic will unnecessarily create an entry for both the and The (I would argue there are meaningful distinctions, but that discussion would be its own post). This behavior only impacts incredibly frequent words, but since those words are very frequent then the machine will have enough information to learn that they are very similar. Processing time will be several percent slower since this increases vocabulary size by several thousand words, but when we’re dealing with hundreds of thousands of unique words this isn’t a major issue.

The main limitation of this algorithm is that if there is little data for a given capitalization, the machine will automatically assign it the meaning of the most common capitalization. However, this is obviously better than the behavior of most systems now which unconditionally assign all capitalizations the same meaning.

Conclusion

Understanding language is hard. Taking shortcuts can still produce cool results, but introduces additional limitations to anything that depends on it. Approximations make things easier – remember the spherical cow from Physics – but they always produce imperfect models. In the case of capitalization and language processing, getting rid of this approximation is relatively straightforward and we can immediately realize benefits while only paying a small computational cost.

In either case, as a layperson or someone consuming a product that promised “natural language understanding”, be aware that these approximations (and their associated problems) exist, and consider the harm that could be caused by neglecting them.

Icebreaker

September 26, 2018June 26, 2021 / Will Beason / Leave a comment

This is a speech I prepared as my Icebreaker speech for Toastmasters. Toastmasters is an organization that promotes public speaking skills, and the famous “Icebreaker” is the first speech a new member gives to introduce themselves to their club-mates. This is mine (with edits suggested from the feedback I received from other members and friends).

Hi, I’m Will Beason and I’m going to change how you think.

While preparing this speech I put a lot of thought into how it differs from introducing myself to someone or writing an “about me” article. I don’t normally introduce myself in speeches, so it was important for me to think about how the medium is different than the modes I’m comfortable communicating in. I can’t simply take what I would say in a conversation or bio and put that into a speech: I’d run into the same problems people have when adapting books to films – they are different media with different strengths and different weaknesses. I want to give you an impression of the care and thought I put into creating things, and why I decided to focus on public speaking now. Along the way, I want to help you understand what drives me.

When using a speech to introduce myself I can’t tailor my responses to an individual. I have to generalize and can’t personalize it as I can in a conversation. So, how do I highlight core components of myself in a speech rather than simply turning my side of an introductory conversation into a speech? In my case, metatext. Metatext is writing that references how writing is put together. In this case, I’ve written a speech that references its own structure and themes as an effort to highlight how carefully I construct things and think about how they go together.

One natural structure I could have followed is listing facts about myself – I attended West Point; I have worked for Google; my professional specializations are data analysis and natural language processing for building artificial intelligence. But these are just rote facts that could overshadow my point. Being able to regurgitate a list of statements about me isn’t really “knowing” me; to know me is to understand the common themes that I’ve used to structure my life and make decisions.

That suggests covering why I decided to focus on public speaking now. What can my answer to that question tell an audience about me? I believe I can have a positive impact on the world by sharing my ideas.

We are responsible for the effects of systems we’re a part of. What’s real – the truth – matters and is something we can discover. We have to be aware of the effects of using data to influence people’s lives. We’re going to have to figure out how to live, to learn, to build with the seven billion other people on the planet.

Being able to speak clearly and impactfully about these ideas gives me another avenue to communicate with others. Writing and one-on-one conversation can only take me so far; I have to be able to reach wider groups in order to be effective. I recognized this years ago, but until recently I lacked the self confidence in myself and my ideas.

I attribute much of this change to my move to San Francisco. Here, for the first time, I connected with the local gay community. I found people with similar life experiences and interests. I felt a connection to a place; that there was a network of people willing to support and guide me as I grow into my own.

I’m taking the time to learn about the influence I can have on the people around me. I want to become a better mentor. How do I introduce people to concepts they wouldn’t otherwise come across, and share the joy of figuring out how to change things for the better on a grand scale? I want people to not only understand and appreciate the thought I put into what I create, but inspire them to do the same in their own work. These aren’t easy tasks, but they’re deeply important to me.

For now it’s the struggle to find what’s next. I’ve finally learned that it’s the people I work with, so much more than the project I’m working on, who will help me grow and find actualization. I’ve worked on five different machine learning projects, each with its own immense potential benefit for the world, but without the right team for me. They’re good projects with smart people working on them, but I’ve realized that just being able to solve any problem isn’t the only requirement for greatness. It also takes a professional drive to make sure the right problem is being solved. A sense of responsibility to make sure we aren’t creating a system that will actively harm society – a path most often entered blindly.

It isn’t easy finding people of that caliber. I can’t force people to really accept that these things are important and make it a part of themselves. The best I can do is be an example that inspires people to follow my path. To quote one of my favorite books, The Clean Coder, “You can’t convince people to be craftsmen. Arguments are ineffective. Data is inconsequential. Case studies mean nothing. [It] is not so much a rational decision as an emotional one. This is a very human thing.”

It’s that human element that I’ve had the biggest struggle with. That we all struggle with. If you have the craftsman mindset, as Robert C. Martin calls it, then you know how it has changed how you see the world. If you don’t, all I can do is speak of a world beyond what you can see and show you what it looks like to live in it. One that glimmers for you in moments of introspection. But you alone can decide for yourself whether you will take the next step.

Thank you.

How to Make Your Resume Searchable

August 26, 2018June 26, 2021 / Will Beason / Leave a comment

I previously covered some of the systemic problems that exist in recruiting today. In it, I mentioned that one of the first steps in the recruiting process is a candidate search engine that analyzes many millions of candidate resumes and professional profiles to find ones matching a given set of criteria.

Whether you like it or not, any resume you send to a recruiter gets packaged with millions of others and sold. The same goes for any site that lets you build a professional profile. So it’s already likely you’re in many databases that recruiters pay to have access to and search, and someone is making money on what you’ve written. Here’s how you can make the best of it.

A Note on Text Search

The three main audiences of your resume.

Two of the (many, many) rules text retrieval systems usually follow when ranking documents are:

How rare are the terms being searched?
How often do the terms appear in the documents relative to their size?

The first means that if your resume has rare terms and someone searches for those same rare terms, your resume will be boosted higher. Also, rarer terms are weighted more heavily – specifically, those terms appearing in fewer documents. So if a searcher typed “database” (more common) and “MongoDB” (less common), it will rank a resume only mentioning “MongoDB” higher than one only mentioning “database”.

The second means that longer documents are penalized if they don’t use the term often. This is usually calculated with word count – so a resume with 100 words mentioning “MongoDB” once will have a higher score than one with 1,000 words, but will be scored the same as one with 1,000 words that mentions “MongoDB” ten times.

The combination of these two mean you want to be as concise as possible for smaller text fields in your profile like “degree” and “job title”. The more words you put in them, the lower your profile will be ranked.

Recall that you are writing your resume or professional profile for three completely different readers:
1. search engines
2. sourcers
3. hiring managers

In this post I’m just focusing on (1). It may be a good follow-up article for me to muse on how to handle writing for groups (2) and (3).

Check Your Spelling

It should be obvious that computers aren’t great at guessing what you mean when you misspell things like names and titles. Until someone tells the search engine otherwise, “software” and “softward” are completely unrelated even if a human can understand it immediately. If you don’t spell things on your resume well, it will not be searchable. I’ve encountered terrifyingly large numbers of misspellings in degrees – literally over 100 different ways of misspelling “bachelor”. I haven’t decided whether the ten common misspellings of “doctorate” are more worrying.

Degree

Use the full name of the degree – no abbreviations. Think “Bachelor of Science in Computer Science”, not “BSCS”. There are many uncommon abbreviations that recruiters simply will not know and won’t bother to look up. My favorite is “EDM” which is a “Master of Education”, not “Electronic Dance Music”. Further, they aren’t going to include abbreviations they don’t know in their profile searches. And do be sure to include the level of education; resumes simply listing “CS” as the earned degree could mean many different things – you won’t get the benefit of the doubt.

Avoid including explanatory text like “Earned a BA in Communications” or “BA in Communications and a Minor in Interdisciplinary Studies”. The “earned a” is superfluous, and if the minor really is important then it should be listed as a separate degree. For real though, only the hiring manager is likely to care about your minor, and if at all only a very slight amount.

If you are interested in working in an English-speaking country, don’t go for the fancier-sounding “baccalaureate” over “bachelor”. You are even more likely to misspell it. I’ve seen cases where applicants even went as far as including the accent, but in the wrong place. If the search engine isn’t using a text normalization technique (e.g. one that removes accents, you’re out of luck). Similarly, English-native recruiters rarely think to type “baccalaureate”, so if you go that route you are unlikely to appear in results anyway.

School Name

As a best practice, use the name of the university on the institution’s LinkedIn profile. Not “Cambridge”, but “Cambridge University”. In this case “university” distinguishes you from “Cambridge College” graduates. Not “MIT” but “Massachusetts Institute of Technology”. And, dear god, never use something like “U of M”; there’s no way to figure out what school you are actually claiming to have attended.

Avoid including the name of the specific college within your university that you attended. If you went to Berkeley and got a business degree, that is the same as saying you went to the Haas School of Business. Candidates are very inconsistent with how they include school names, with everything from “Haas Berkeley” to “University of California, at Berkeley, the Haas School of Business”. It’s just difficult for the search engine to know that your profile should be ranked the same as the person who typed “University of California at Berkeley”.

Don’t combine multiple schools you went to in one line or entry. If you write “Harvard, Stanford, UCLA” as the name of the school you went to then you run the danger of not being found when someone searches for any of those schools. At best, your score will be one-third other candidates.

Company Name

Much of the advice from the School Name section applies here.

Use your employer’s name as listed on the organization’s LinkedIn profile. If you worked for a specific well-known division or product of your company (e.g. “Walmart Labs” or “YouTube”), use it instead. Otherwise, mention the division or product in your job description.

Again, avoid explanatory text. Mentioning “internship” is accurate, but will just cause you to be ranked lower. By all means, include it in your job title, but the company name field is not the place. For “Contractor”, the best place for this is the job description.

Job Title

You can put whatever job title you want on your resume. Your resume isn’t something for former employers to check, it is how you are presenting yourself to future employers. Obviously don’t lie or misrepresent yourself, but feel free to choose a common synonymous title over the specific one you may have been assigned. I’ve seen too many cases like “Integrated Data Network Engineer Level IV” who will never be found among the deluge of “Network Engineer”s.

If you are a contractor, the most common practice (of many, many different practices) is to append “(Contractor)” to the job title. This is up-front and honest, but I feel unfairly penalizes them in searches. Beginning your job description with “Contract work for …” is fine, and makes it more likely you’ll be seen.

Search ONET OnLine to get ideas of common job titles in your field. If you’re willing to do the full legwork, look for the occupation whose description most closely matches your functions in the Standard Occupation Classification and either use one of the examples directly, or take it back to ONET as inspiration for a search.

Don’t use meaningless job titles. Many people list things like “Specialist”, “Summer Intern”, or “Assistant” as their title and there’s no way to know what they mean. If you are a specialist, say what you specialized in. If you were an intern, be complete and say you were a “Software Developer Intern”.

Conclusion

Much of this isn’t obvious advice. We’re dealing with imperfect systems that aren’t optimized for resumes being used to search resumes. If you were only writing for humans proficient in your job functions, your resume should look very different. But this is the system we have for now, and you’re part of it whether you want to be or not.

Do remember that you can always send recruiters and hiring managers an updated resume when they contact you

What’s Wrong with Recruiting?

August 24, 2018June 26, 2021 / Will Beason / 1 Comment

Have you ever received an unsolicited message from a recruiter about a position you’re not interested in? Do you ever get passed up for positions you are qualified for before you’ve even interviewed? There are reasons for that, and they kinda suck.

I’ve worked in the HR automation field for about a year now, and this is what I’ve seen:

Sourcers often aren’t familiar with the jargon of the positions they’re hiring for.
Search engines aren’t good at ranking candidate profiles.
Candidates don’t know how to write their profiles to make them easily searchable.

An entity relationship diagram of candidate sourcing, with the three relationships that cause the most pain in red.

Sourcers

Sourcers are the people who look for candidates who are both qualified for open positions and are interested in filling them. The people who do sourcing often have the job title “recruiter”.

The process of finding and hiring a new employee can roughly be broken down into the below steps. (Depending on the company, steps may be condensed or done by the same person.)

A manager tells a hiring manager that they need someone with some set of qualifications/skills.
The hiring manager write a job requisition.
A sourcer reads the job requisition and looks for candidates who meet the qualifications.
The sourcer contacts the candidates, verifies their interest, and refers them to the hiring manager.
The hiring manger / team / etc. interviews the candidate.
The candidate is hired.

The issue here is with step 3. Sourcers generally have very little training, learning most of their trade on the job. They also hire for many, many different positions. The same sourcer may look for accountants, software developers, and mid-level managers. There’s so much jargon and so many different skills to juggle that sourcers rarely get much more than a superficial understanding of what they’re looking for. They don’t have the time to absorb the overwhelming amount of information in every profession. (Hiring managers, luckily, tend to specialize and pick up on the nuances of what they’re looking for)

Recruiters source for many, many different roles and don’t often have a deep understanding of what they’re looking for. A recruiter sourcing their first “Database Architect” may discard a candidate with a decade of “NoSQL” and “Data Modeling” experience because they aren’t familiar with the field. This can cause problems for candidates who use jargon that is too specific – they may be passed up because their resume or professional profile isn’t comprehensible to a layperson.

Candidate Search Engines

Sourcers usually go after passive candidates. As opposed to active candidates who are currently looking for a new position and applying to openings, passive candidates are waiting for opportunities to come their way. The set of potential passive candidates is in the many millions – it’s literally the entire job force. There’s no way a human can look through this set for every opening.

Fortunately, sourcers have tools that make it easy to filter down the candidate pool and sort candidates by different criteria. LinkedIn Recruiter, for example, gives recruiters a search engine for everyone on LinkedIn (it’s one way LinkedIn makes money on your professional profile). Like many other such tools, it gives sourcers the option to look for candidates with specific job titles, skills listed, and degrees. On the surface, this sounds great.

In the real world, people are messy. For the most part, these are text searches of what candidates have typed. Did you type “Master of Business Administration” while the sourcer searched for “MBA”? Tough luck. Did you say “MySQL” when the sourcer searched “SQL”? No dice. Are you a CPA but the recruiter typed “Accountant”? Nope. Are you a “Software Engineer” but the sourcer typed “Software Developer”? Unless you type what a sourcer thinks to type in their search engine, your profile won’t appear.

A common response sourcers have to this is to construct terrifyingly elaborate boolean queries containing hundreds of variations on titles and skills. Sourcers sometimes share parts of their queries, and some sourcers don’t even know how parts of their own queries work. If a part of the query breaks it may take hours or days to find the problem and fix.

Candidate Profiles

The above problems are unintuitive. There’s no way for a candidate to know that sourcers are (1) passing them over for using jargon more specific than the sourcer knows or (2) just not seeing them because they don’t match their search queries.

Candidates are startlingly diverse in listing their qualifications on resumes. Even seemingly-limited fields like “degree” may have tens of thousands of variations. Once you get to a more varied field like “educational institution”, you can end up with tens of variations for the name of a single university! This isn’t just misspellings: many people include the college within the university, their major, acronyms, and explanatory text. Job titles are an order of magnitude worse (I measured it).

The candidates who receive the most unsolicited messages about job openings are simply those who have typed what the average sourcer thinks to look for. I should do a follow-up article on advice for specific fields in a resume or job profile, but the gist is to keep in mind that your resume has to be general enough that someone unfamiliar with your role could do a search and find you, but specific enough (e.g. in job experience descriptions) that it piques the hiring manager’s interest.

AIDP: Choose the Right Level of Abstraction – Part 1

August 21, 2018June 26, 2021 / Will Beason / Leave a comment

Continuing my AI Design Principles series. While this post doesn’t specifically reference AI design, it is to open a discussion that will continue in Part 2.

There’s a common challenge in system design in general that applies strongly to AI design – making sure that your system solves the problem at the correct level of abstraction. In general, this can be posed as the question “does the system let users communicate my problem in the same way I think about it?”

Imagine the interface to work a car:

the steering wheel
gas pedal
brake

CarInterface

This is exactly the correct level of abstraction – each component has distinct uses that are mostly orthogonal and the controls correspond to how we think of driving, e.g.:

“turn left” -> rotate steering wheel counter-clockwise
“go faster” -> press down on gas pedal
“make a sharp right turn” -> use brake to slow down to appropriate speed and turn steering wheel clockwise

Level of Abstraction too Low

This is the mistake I see most often in system design – making controls too granular.

Imagine if instead of one steering wheel you had four, one for each wheel. This would be madness and unnecessary for most people. While you would (technically) have the ability to steer more precisely than with just the one steering wheel, most people would not intuitively know how to use the controls to steer the car. For example, you can’t just rotate all of the wheels the same way – that would change the direction of the car but not actually change the direction it was pointed.

Designers make this mistake when they design too much for super-users. While they exist (and may be very loud), in most cases they aren’t the bulk of your users. Adding the ability to tune everything is often at the expense of making it easier for new users – adding a large learning curve as well as usually mandating setup before even using the tool at all.

I recognize this one when I think “I know it’s possible to communicate what I want to the system, but I have no idea how.”

Level of Abstraction too High

This is the opposite case – where individual controls try to do too many things that common activities aren’t possible. I see this is less often, but it’s no less a hindrance.

Now imagine if your steering wheel controlled both direction and acceleration – with the acceleration lower the more you turn the wheel. The designers were probably thinking “When I’m not turning I want to go faster, and when I’m turning I want to go slower, so let’s tie those things together!” They neglected the obvious (to the user) case of stoplights and traffic.

This is more often made when designers don’t go through the legwork of requirements development. I’ve had the experience more than once of sitting in a room where people are happy to hallucinate the problems of users and avoid actually talking to them. If you only understand a subset of the problems users have then your solution will be incomplete.

As a user, this one is characterized by thoughts of “It isn’t possible for me to communicate what I want using the controls given to me.”

If Users Think at the Wrong Level of Abstraction

One danger is if users have only worked with solutions at the wrong level of abstraction. They may have been trained to think at the wrong level, and in requirements development your job is to divine that.

Suppose you are designing an app that lets people make artsy customized tables and all of the competing apps require users to create and upload .svg files of the shape and size of the table top. When you go to users, they talk about streamlining the .svg upload controls to make things like scaling easier. Another common complaint is that it is too hard to find exactly the coloration pattern they want – the selection other apps offer isn’t big enough. You ask several to get a sense of the sorts of tables they make, finding that most go with rectangular table tops but vary wildly on the coloration.

It strikes you that it should be easier to tell the system “I want a rectangular table top with these dimensions” since it is such a common use case. The bit of intuition here comes from looking for patterns in how users use the tool. When a user thinks “I want a rectangular table top” they aren’t thinking “I want a table shaped like this .svg file” even though that may be what they may literally say. The users are thinking at too low of a level.

On the other hand, you notice that the table coloration patterns users want varies so wildly that each requested pattern would essentially just be used by one person. As many of the users are artists themselves, they show you pictures they’ve drawn of their dream table coloration but have no way of telling the apps (or, for that matter, finding a close one in the thousands of different patterns available). In this case the users are thinking at too high a level of abstraction – there really needs to be a way for them to just upload their own designs as the coloration pattern.

Choosing the Right Level of Abstraction

This requires actually talking to the people whose problem you’re solving – requirements development. The point – too often overlooked – is to figure out (1) what problems your users have and (2) how your users think about those problems. They’re not the system designer – you are – so of course you won’t usually be able to literally use their suggestions. That doesn’t make what they say any less valuable.

When I go through this part of the design process, I ask myself these questions about the system I’m designing. Does the system:
1. allow users to solve most problems the users have?
2. communicate possibilities in the same language users think of their problems?
3. make it easy to solve problems that are easy to think about?
4. allow for too many nonsensical inputs?

In the car example where the level of abstraction was too low, it is easy to think “I want to turn left” but difficult to communicate that to the system through the four steering wheels. This violates (2) and (3). It also lets the user do many things that don’t make sense – like turning the left wheels to the right and the right wheels to the left, violating (4).

These aren’t easy problems to solve. Choosing a good level of abstraction requires a mix of talking to people, thinking about how they perceive their problems, and being aware of the larger context influencing how people talk about what they need.

AI Design Principles: Choosing the Right Problem – Part 2

July 6, 2018June 26, 2021 / Will Beason / Leave a comment

Part 2: Begin with a Decision that Many People Make Often, and Make Quickly

You’ll shoot yourself in the foot if you try to solve the sort of problem only a 10th-level wizard specializing in conjuration makes the first Tuesday of every prime-numbered year. Decisions made rarely or by few tend to be very difficult or to have little generalizable utility.

What is the ideal flux density be for each individual magnet in a particle collider with maximum experimental resolution around the 125 GeV range?
Is Dragon’s Egg or Mission of Gravity a better first novel to study in a Hard Science Fiction class?

Sure, it might be fun to build an AI that could actually solve these problems, but for now it’s much more efficient to leave rare problems to humans. Remember – we spend most of our time mired in decisions everyone makes.

Begin with a decision problem where

it takes less than a minute to make the decision
lots of people make this sort of decision, and
people who make this decision tend to do it often.

This is really a litmus test for deciding whether a problem meets the requirements mentioned in Part 1. A problem that passes this test will satisfy many of the requirements.

Choose a decision problem where it takes less than a minute for a person to make this decision.

It should take less than a minute to make this sort of decision. If it takes one minute to make the decision, you can only manually check about 500 samples per day. Beyond this point you lose the ability to reasonably manually verify the correctness of your results. That is, assuming you have requirements calling for over 90% accuracy. If you don’t have a lot of labeled data, this small of a return on time invested in labeling data that takes long to label isn’t usually worth it. I’d also question: if it takes more than a minute to make the decision, is it really not reducible to a set of smaller decisions?

Say you’re looking to make an AI to help decide what expensive watch to buy. Things that might go through your head when making the decision might include:

Is it within my budget?
Is it the color I want?
Is it available in my size?

These are simple problems that an AI assistant could use to automatically discard watches that aren’t worth considering, leaving you to focus on:

Is it comfortable?
Do I like the style?

Further, the easily automatable pieces of the problem are generalizable. They aren’t just applicable to watches, but to shoes, shirts, and a variety of other clothing items and accessories.

Choose a decision problem where lots of people make this sort of decision.

If many people can make the decision, you can easily check your work by collaborating with them. You’ll know you’ve found one if there is an entire profession with people constantly making this decision.

The decision-makers are your source of requirements (hint: you’re making this algorithm to automate away the more tedious of the decisions they make; you’re building it for them). They’re a great source for labeled data. If labeled data isn’t easy to come by, you can usually

outsource data labeling to them,
consult them on tricky cases, and
literally use their daily work as training data.

Most preferably, as the designer of the algorithm YOU should know how to make this sort of decision. If you learn how to make the decision, or at least get some intuition, it saves you a lot of time when verifying system performance.

Choose a decision problem where people who make this decision tend to do it often.

This gets at the amount of data which may exist, or is easily generatable. If it’s a once-per-year decision, you aren’t likely to have a lot of historical data to base your model on.

If your system is designed to be used in users’ everyday lives, they will see the improvements from your system constantly. Highlighting common potential errors in documents as they’re typed is a prime example: people need help ensuring they typed a word correctly or haven’t made some easily-catchable mistake. Instead, they’re free to focus on what they actually want to say rather than whether it’s really spelled “concured” or “concurred”.

If it cost me nothing, given the choice between not having to make an almost-mindless near-daily decision and one I’d have to spend hours contemplating but may only do once in my life, I would choose to do away with the former. What’s great is it usually ends up being simpler to automate as well, so not only is the cost lower, but the reward is greater.

AI Design Principles: Choosing the Right Problem – Part 1

July 3, 2018June 26, 2021 / Will Beason / 1 Comment

Part 1: Begin with a Simple, Easy Decision Problem

DecisionProblem

If there’s a big mistake people make when designing machine learning systems, it is deciding to tackle the wrong problem. Pick the wrong problem, and you can easily spend months bashing your head against one that would require a research team years to solve properly. Most companies don’t have that magnitude of resources to devote to solving an individual problem, and it isn’t cost effective anyway.

How do you decide what problem to solve? Begin with a simple, easy decision problem.

What is a “simple, easy decision problem”?

A simple, easy decision problem is one that

involves automating making a decision
has a well-defined set of mutually-exclusive possible decisions
has explainable decisons that reasonable people would agree on
is so simple that it cannot be decomposed into smaller problems worth solving
has a fast, easy solution

Why?

Choose a decision problem to automate. Decision problems automate thought processes that humans do. If you really just want to understand your data, then at this stage you really want data exploration and analysis (possibly using machine learning). In general you wouldn’t automate a process like k-means clustering – you’d do it with a specific purpose in mind. On the other hand, you would automate a process like “Should I switch the light in the north-south direction at this intersection to red?”

Choose a decision problem with a well-defined set of possible decisions. If the set of possible decisions the machine might make is unbounded, or it isn’t clear what a decision means you’ll have problems. If there are infinite (or simply so many a human couldn’t reasonably consider all of them) number of possible decisions, the simplest algorithm which can produce all possible answers is very complex. You lose the ability to train on each set of possible responses as gathering data on every one may be impractical.

On the other hand, deciding “Which one of these ten genres does this book fall in based on its title and text?” is well-defined. As a person making the decision, you list out the possible genres and pick the one the book best falls in. The process for the machine is analogous – it may calculate scores for each genre and pick the top one.

Choose a decision problem with a set of mutually-exclusive decisions. If there are multiple correct decisions for the same set of inputs, you lose the ability to specifically train the model. If any combination of answers is permissible and you have more than, say, 20, you really have an answer space with over a million possible choices. It also complicates training the model. Say you’re training a chat bot to answer natural language questions users pose. If the chat bot gives three of five essential pieces of information when responding to a particular question, “how correct” was it, and how should the interaction be counted in training or evaluation?

Choose a decision problem where reasonable people would make the same decisions. Suppose you’re building a system to pick a wall color palette for a room based on the furniture the owner already has. If you ask five interior designers you’ll get five reasonable, but different, palettes. Do you use all of these as positive training examples? How do you evaluate whether it made the right decision on a new room and set of furniture?

Choose a decision problem that is so simple that it cannot be decomposed into simpler decision problems. Let’s say you’re building an app that automatically generates a grocery list for users based on what they have in their refrigerator. Considering every possible shopping list simultaneously would be a nightmare. Instead, we can decompose this into one problem for each food item, and a larger problem that merges these decisions into a single grocery list.

The model for deciding whether to include each item on the list might based on:

Does the user currently have any of this item (or similar) in their refrigerator? Is it expired?
How much does the user usually consume per day? Week? Month?
Is the user’s past consumption of this item regular or spurious?
Does the user already have items that can be used in many recipes with this item?
Is this item easily available to the user?
Has the user indicated they are allergic / don’t like this item?

The model that merges these sub-solutions might be based on:

How wide a range of recipes does this list, plus their at-home stock, allow? (Also, prioritize recipes the user has favorited, or are similar)
Does the user have enough available space to store everything on this list?
What items can be eliminated that cause the smallest decrease in potential recipe variance?

At the point where we’re combining the outputs of many decision models we’ve technically diverged from decision problems, but I think the idea of merging solutions this way is powerful. The point is the smaller per-ingredient decision problems are individually good starting points.

Choose a decision problem that has a fast, easy solution. Greedily, the longer before you have a working prototype or implementation, the longer before you see returns from the work that went into your work. More importantly, the sooner you see how implementing the model made improvements in someone’s workflow or life, the faster you’ll be able to iterate on the model to make it even better. You’ll get quick feedback on a simple solution, so you’ll be able to grab more low-hanging fruit if you want to improve this model. If the solution is good enough for now, then it frees you to go and work on another problem!

The Revelation Principle and Salary Negotiations

May 21, 2018June 26, 2021 / Will Beason / Leave a comment

Salary negotiations are hard. It’s an information asymmetric game where the job candidate is trying to maximize their salary, and the recruiter is trying to hire them for the lowest price possible.

Both have remarkable incentive to misrepresent themselves. The recruiter’s strategy is to lowball the candidate – trying to get them to revise their self-worth downward. The candidate’s strategy is to highball – anchoring their perceived value to this higher value.

This is poisonous for a variety of reasons. The candidate may take a salary that’s less than they’re happy with, which is one of the most common reasons for attrition. It puts employers in the position of taking advantage of someone even before they’re hired. Most importantly, the game pits the prospective new hire against the very company that wants their loyalty!

We can use the revelation principle to turn this into a collaborative game where the optimal strategy for both the prospective candidate and the employer is to honestly say what they think. In fact, the 2007 Nobel Prize in Economics was awarded to the team that proved that any game where parties have private information can be turned into a game where the best strategy is to be perfectly honest.

But, what do we need to do to get this?

Let’s start with the Vickrey auction as inspiration. Standard auctions generally operate in one of three ways, (1) increase bids until we are left with a price only one person is willing to pay, (2) decrease price from a starting amount until someone is willing to pay, or (3) bidders secretly submit bids and the highest bid wins and is paid. Both of these encourage elaborate strategies that lead to suboptimal bidding. The Vickrey auction is just a slight variation on (3), where the highest bidder pays the second highest bid. See the Wikipedia page (linked above) for a sketch of the proof showing why it rewards honesty.

But that’s bidding. What about salary negotiations? All you need is two pens and two pieces of paper.

Write down how much you think you’re worth on a piece of paper. Have your prospective employer write down how much you’re worth to them. If they write down that you’re worth as much as or more than you think you are, go with your number. Otherwise, thank them for their time and go elsewhere.

Strategies

Terms

Truthful – making an offer you think reflects your actual value
Untruthful – making an offer you think does not reflect your actual value
Underbidding – offering less than the candidate’s amount
Bidding – offering exactly the candidate’s amount
Overbidding – offering more than the candidate’s amount

Employer Strategies

The dominant strategy is for both you and the prospective employer to be honest. Sure, you may miss out on money but you’re already at an amount you’ve agreed you are happy with.

The employer is equally incentivized for truthful bidding and truthful overbidding. Their bid doesn’t change how much they pay you, so the strategies have equal value.
The employer is incentivized to truthfully underbid. If to them you’re not worth as much as you think you are, they should not hire you.
The employer is discouraged from untruthful overbidding – they end up paying you more than they think you’re worth.
The employer is discouraged from untruthful underbidding – they miss out on getting you at all.

Candidate Strategies

Underrequesting – untruthfully requesting less than you’re worth
Overrequesting – untruthfully requesting more than you think you’re worth

For candidates, as with employers, the dominant strategy is to provide a value that is an honest assessment of your worth.

You are discouraged from underrequesting. Of course you want to be paid what you’re worth! Being paid less than you think you’re worth greatly increases the chance you’ll soon be looking for a job soon – for a higher amount.
You are encouraged to overrequest only if you think the employer vastly overestimates your self-perceived worth. If you think you’re worth $80,000 and overbid $100,000, you must believe that, given that the employer will offer at least $80,000, they are over 80% likely to offer $100,000 or more. In most cases this is unlikely.
You are discouraged from overrequesting otherwise as it unnecessarily increases your risk of not being hired by an employer who values you at least as much as you’re worth.

To counter to candidate overbidding, of course, the employer must get an accurate representation of your skills. Or, at least, convince you that they have.

Final Thoughts

The outcomes and strategies of this game probably change considering that (1) you probably interview for multiple positions and (2) companies usually interview multiple candidates for a single position. I’ll have to think about what the implications are, as they could change.

Tribe of Mentors and Cybernetics

May 5, 2018June 26, 2021 / Will Beason / Leave a comment

71gFZnfHXIL

Tim Ferris sent hundreds of successful people a list of the same 11 interview questions and collated the results of the 140 responses into a book, Tribe of Mentors. This is exactly the type of data a cybernetic approach is for. The interesting things here are less what any individual interviewee said, but patterns in what they said collectively. What were their roadblocks? What was important to them? How did they disagree with each other?

One of my favorite quotes directly attacks the premise of the book: “[Advice] is almost always driven by anecdotal experience, and thus has limited value and relevance .… Ignore advice, especially early in one’s career. There is no universal path to success.” (John Arnold, page 374) It should say something that when retraversing the book to find this quote I stumbled on five similar ones. I agree with the sentiment, but John Arnold misses a broader point, succinctly said by Matsuo Basho as “Do not seek to follow in the footsteps of the wise; seek what they sought.” It is not that there is nothing to be learned by listening to advice, it is that advice is not transferable without understanding context or principles. The reasoning behind a conclusion is more useful than the conclusion, and patterns of reasoning across many mentors is even more so. From a single person’s reasoning you can follow their logic and determine how believable their conclusion is for yourself. From the reasoning of many people you have the opportunity to develop principles that you can apply to other contexts. So sure, there is no universal path or even a universal map. But in seeing how many others read the maps of their lives, maybe you can learn to read your own.

Conflicting advice is the best source of this sort of direction. Fortunately, Tribe of Mentors is full of conflicting advice. Tim did an excellent job positioning similar people with strong disagreements – at times I had the thought “didn’t they just say the opposite thing?”, but when I turned back a few pages I saw it was from a different interviewee. The myriad dissonant voices blur together into beautiful higher-order concepts.

For example, here’s a smattering of work life advice from the book:

“You should set up your life so that it is as comfortable and happy as possible.” (Susan Cain, 13)
“Ignore anyone who tells you to go for security over experience.” (Patton Oswalt, 106)
“Advice they should ignore: … Avoid risk. Play it safe.” (Josh Waitzkin, 197)
“I do not believe in work-life balance.” (Debbie Millman, 29)
“Burnout is not the price you pay for success.” (Arianna Huffington, 214)
“Growth and gains from from periods of rest.” (Amelia Boone, 130)

For this I’ll take the position that “The face is that when two extreme opinions meet, the truth lies generally somewhere in the middle.” (Annie Duke, 172) I’d go a step further and claim that not only does the truth of this lie somewhere in the middle, but everyone has to figure out where they fall on the spectrum themselves. Sure, I could be deluding myself that a sense of security is required for my creative work just as someone else might incorrectly think they perform best in adversity. Who are we to question someone else’s self-experience? The best we can do is show them that there are other paths and give them the tools for assessing the one they’re on.