Kata Refactor - Show Me Your ID

I have been doing katas on Codewars for many years now, and I thought as a series of exercises, I would revisit my old and often bad solutions and refactor them as I might in a production system. So today, I will refactor Sir, showMe yourID.

The problem statement

There’s a new security company in Paris, and they decided to give their employees an algorithm to make first name recognition faster. In the blink of an eye, they can now detect if a string is a first name, no matter if it is a one-word name or an hyphenated name. They’re given this documentation with the algorithm:

In France, you’ll often find people with hyphenated first names. They’re called “prénoms composés”. There could be two, or even more words linked to form a new name, quite like jQuery function chaining ;). They’re linked using the - symbol, like Marie-Joelle, Jean-Michel, Jean-Mouloud. Thanks to this algorithm, you can now recognize hyphenated names quicker than Flash! (yeah, their employees know how to use jQuery. Don’t ask me why)

Your mission if you accept it, recreate the algorithm. Using the function showMe, which takes a yourID argument, you will check if the given argument is a name or not, by returning true or false. Note that

  • String will either be a one-word first name, or an hyphenated first name , its words being linked by “-”.
  • Words can only start with an uppercase letter, and then lowercase letters (from a to z)

Now is your time to help the guards !

My original solution (2016-06)

1
2
3
def show_me(name)
  !!(name =~ /\A[A-Z][a-z]+(-[A-Z][a-z]+)*\z/)
end

In a production system, this is unacceptably sparse. I would have serious concerns if I saw it in a code review.

  1. There’s no explanation of intent.
  2. There’s no “why” around or in the code.
  3. The name of the function sans module/class provides no context.

We have tests because of the kata framework provided by codewars, so we will rely on those. I will reproduce them at the end of the post. Having to read tests to understand a block of code is a smell.

My new solution (2020-07)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def show_me(your_id)
  FirstNameChecker.valid?(your_id)
end

# validate a string to check if it is a valid first name, which can 
# include hyphens.
# Example Usage:
#   FirstNameChecker.valid?("Francois")    => true
#   FirstNameChecker.valid?("Jean-Eluard") => true
#   FirstNameChecker.valid?("Mary Jean")   => false
module FirstNameChecker
  NAME_PART_REGEX = /\p{Upper}\p{Lower}+/
  NAME_REGEX      = /\A#{NAME_PART_REGEX}(?:-#{NAME_PART_REGEX})*\z/

  def self.valid?(str)
    NAME_REGEX.match?(str)
  end
end

So I pointed out 3 things in the original solution that were less than ideal. Let’s talk about how this solution solves them and some things that could be different here, too, depending on greater system context.

There’s no explanation of intent.

This new module FirstNameChecker now explains both how to use it and what is expected to come out of it, which is better than the nothing of the original solution. Code comments are tough in any system because code can drift from them. Making the comments be around an entire module forces future churn and changes to address those in a good engineering environment.

There’s no “why” around or in the code.

Often when you come to a codebase or a part of a codebase after a long time, the reasoning for something can be a mystery. Small commits and good commit messages are one part of the puzzle. Good naming and explanation are another. Using shared team patterns and established team code practices are yet another.

So to dig into the changes, the original method is called show_me and this solution assumes we have a callsite that can’t be changed, so we will use the original function as a proxy into the new method we’ve made. You can imagine in a larger system that there could be other Checkers with similar simple semantics. So now we can read the code in show_me as english. “Did we get a valid first name?” And I’ve fixed the variable passed to the method to be named as in the description.

Equally we’ve added a some readability to the regex through the use of Ruby’s Regexp Character Properties. Perhaps the trickiest part of the regex now is the non-capturing group (?:), which could be left off without harming this code. It can be good practice particularly for high-frequency calls given memory allocations to store the matches. This code probably isn’t that, but someday it could be.

The name of the function sans module/class provides no context.

By introducing a module, we now have much more context about the code’s intent. By using regexp character properties and well-named constants, we can read the regex in english with a little bit of regexp knowledge. You should be able to explain this regexp to a junior developer very quickly. The original perhaps, too, but it required having more loaded at once because of how it was written.

So to me, this solves the problems of the original and provides clarity around the code that wasn’t there before. Ultimately that’s what we want. Lower overhead for anyone in the future who has to maintain this, including ourselves! And we didn’t have to make any changes to the tests!

Changes we could make

One change we could make is to replace #match? with ===. #match? reads better to me, but they would be equivalent here - outside of perhaps performance differences we would have to look into if the method became a bottleneck. And your team may like === better, and that’s fair.

When we read the original blurb, it talks about “detecting” a first name, so perhaps FirstNameChecker.valid? would become FirstName.detected? or NameDetector.first_name? or FirstNameDetector.found?. Largely what the name of the module is is going to be very specific to the context of the entire application. If there are other detectors and our team likes the semantics we’ve used there, we will probably use some variant of Detector. If we view this function as a validator function more akin to traditional Rails, we would go with the .valid? check I’ve used.

Going on, when we see a proxy of this sort, it may be time for refactoring in other parts of the system to make things more uniform. But a proxy of this sort may be a useful place for adding tracing and monitoring and feature flag functions, so it’s not always possible or a good idea to remove a proxy. It is often worth looking, though.

Though it would be worse performance-wise, another solution might employ simple string tokenization and/or character-by-character string parsing. Depending on the team structure those might be easier to understand and maintain. It may not be a better long-term solution but in the short term it could be an advantage for everyone to understand a simpler solution, or a solution with no regexp if the team is very unfamiliar with those.

Downsides

There is now more code to maintain.

At least in the vacuum of this simple problem, we’ve introduced a very large concept in the form of a Checker system. In a real system, the impact would probably be less than here, since in this case the whole system originally was one meaningful line of code.

Regexp is not always easy to grasp for new developers and developers who’ve not been exposed to it.

Addendum: The Test Suite

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Test.describe("Basic tests") do
  Test.assert_equals(show_me("Francis"), true, "Francis is a name !")
  Test.assert_equals(show_me("Jean-Eluard"), true, "Jean-Eluard is a name!")
  Test.assert_equals(show_me("Le Mec"), false, "Le Mec is not a name!")
  Test.assert_equals(show_me("Bernard-Henry-Levy"), true, "Bernard-Henry-Levy is a name!")
  Test.assert_equals(show_me("Meme Gertrude"), false, "Meme Gertrude is not a name! It's my Grandma ")
  Test.assert_equals(show_me("A-a-a-a----a-a"), false)
  Test.assert_equals(show_me("Z-------------"), false)
  Test.assert_equals(show_me("Jean-luc"), false)
  Test.assert_equals(show_me("Jean--Luc"), false)
  Test.assert_equals(show_me("JeanLucPicard"), false)
  Test.assert_equals(show_me("-Jean-Luc"), false)
  Test.assert_equals(show_me("Jean-Luc-Picard-"), false)
end