Musings by Mrigank Pawagi

A Time Machine Of My Own

2024-10-16T00:00:00+00:00

Not long ago, I realized I have been holding onto a tiny time machine. I realized this when I once accidentally turned it on! I do not have the user manual for this time machine, but from several instances of such accidental time travel, I now have some idea of how it works.

This time machine is quite primitive — in the way that it does not allow traveling to arbitrary points in time (or at least I have not yet figured out how to do that). But what I have found is that it can take me back to certain checkpoints in time. These checkpoints are not really particular instances in time, but rather intervals that can span anything from an hour to a few weeks.

Inadvertently, I have created several such checkpoints in the past. In the summer of 2023, while working as a summer research intern at CiSTUP¹ in IISc, I would often play Forest² while working in my cubicle, walking back from the office to my hostel, and while working from my room. And when I say this, I mean that I would almost exclusively³ play Forest in that month of summer. This created a checkpoint in time, and now playing Forest activates my time machine and takes me back to that same summer — if I close my eyes, I can easily believe that I am still an intern at CiSTUP. Not just that, but I can also once again feel the emotions of the excitement of walking to my graduate student guide’s cubicle with a working solution, the solitude of an empty campus, the eeriness of not meeting most of my friends for weeks, and the eagerness to start my second year at IISc. This summer, I found myself playing a few selected songs from the album Clancy on a loop while preparing⁴ food in my apartment during my internship at UIUC. Playing those songs now brings back the image of my apartment, the makeshift taste of my makeshift food, the remoteness of being away from family and friends, and the diminishing uneasiness of being in a totally different place. Similarly, Can’t Help Falling In Love teleports me to my late-night walk around the IISc main building somewhere in the middle of my third semester — when I listened to the song for the first time after quite a while, with a cool night breeze and a dimly lit scenery around me. Ode to Sleep carries me back to the various times in early 2022 when I opened rejection letters from different undergraduate programs abroad⁵, while Cancer brings back the time when I was at the peak⁶ of my JEE preparation. There are many more examples, but I think I have made my point.

To be fair, this is not a discovery — I am certain that many of you have had similar nostalgic experiences. Further, music is not the only thing that can bring about travel through time — odors, tastes, textures, and even sights can do the same. The wonderful thing about music though is that I can choose to play any song of my liking and thus choose the time I want to travel to. I still have to better understand the mechanism of creating checkpoints, particularly how my context and the meaning of the music itself play a role in it. But for now, while I am glad to be able to occasionally travel back in time, I am partly afraid that if I do not diversify my playlist, I will soon run out of songs to create new checkpoints with.

Center for infrastructure, Sustainable Transportation & Urban Planning. ↩
By Twenty One Pilots, like the rest of the songs, covers, and albums mentioned in this post. I will henceforth omit mentioning this. I also want to mention here that they are the only band I listen to. ↩
My sister thinks that this is a weird thing to do. Even my parents are surprised that I do not get bored listening to the same songs again and again. I think I am too lazy to change my playlist, but in order to make this footnote longer, I will cite a poem from my 8th grade English textbook — Unfolding Bud by Naoshi Koriyama — which uses the metaphor of an unfolding bud to describe how the true meaning of a poem (or in my case, a song) can be derived only by reading (or in my case, listening) it multiple times. While I still may not have fully understood the essence of the songs I listen to, I do feel that by doing so I am able to find newer meanings in them. ↩
I am refraining from the use of the word “cooking” because frankly I mostly either prepared a taco or a burrito from ready-made materials or reheated the home-made Indian food I was fortunate to get home-delivered to me twice every week. Nonetheless, even this was a big deal for me as someone who has never gone past putting water to boil. ↩
Interesting story for another time! ↩
The peak of panic, by the way. ↩

Two Years at IISc — In Figures

2024-10-11T00:00:00+00:00

Two years ago on this day, I reported at the campus of the Indian Institute of Science — to begin my undergraduate degree in Mathematics and Computing. With this, I have completed somewhat half of my IISc time¹. There’s a lot to say about these two years, including the things I have learnt and the people I have met. But I will save my time² by summarizing some highlights in doodle-plots³.

I may or may not be implying that I am sitting on a time bomb⁴. ↩
With the excuse of trying to be creative. ↩
I am not sure if that is a thing, but what I mean is that I am presenting plots that by no means are accurate (besides the fact that they are not even informative). The axes are unlabelled on purpose, and the lines are supposed to be more qualitative than quantitative. Although I drew each of them in around a minute or two, this is the best I can do with my drawing skills. ↩
Along that tangent, I will mention that one of the most important things I have learnt in these two years is the extensive use of the Law of the Excluded Middle for the purpose of coming off as smart in conversations. ↩

We Called Off the Deadline

2024-08-17T00:00:00+00:00

Deadlines have some sort of magic to them. A project can continue to drag on for months, but as soon as we decide to make a deadline¹, usually one of two things happen.

The entire universe conspires to make sure we meet the deadline with everything we want to do.
We conspire to make sure we meet the deadline with everything we can (and should) do.

Either way, we do end up with much more than we would have without the deadline. That is the magic of the deadline - we suddenly know what are the most important parts of our project, what are the optimal ways to implement them, and what we are supposed to do if something goes wrong. We are able to read many more papers, write many more lines of code, write much more text, and also bear many more meetings!

Admittedly, even if we decide to not make the deadline, we are much better prepared for the next one². But this is also when the magic of the deadline is the most apparent. As soon as we decide to call off the deadline, all of the super powers that were granted to us by the deadline are taken away. In fact it may take a little while to even get back to the same level of productivity that we had before we the deadline was set.

I was initially going to conclude by saying that deadlines seem to be like booster shots of productivity (or motivation). But now I think they are more like steroids. They allow us to gather our own strength and concentrate it for a short period of time³ (and when their effects wear off we are obviously temporarily weaker). They may also have some side-effects. For instance, people may begin to prioritize results over the process, and may (hopefully) unintentionally bias their study. They may also start caring more making things work well, instead of making them well. Most importantly, people stop taking intellectual risks which I feel is a very important part of research.

As only a junior researcher⁴, I cannot really comment on whether these side-effects are acceptable (or even desirable). Yet from my experience⁵ so far, I think deadlines are a powerful tool that we can harness to our advantage (of course, with terms and conditions applied).

Of course we set internal deadlines throughout the length of the project, but they are comfortably flexible. I am talking about a deadline that the universe has set for us, like a conference submission deadline. ↩
This is also more common than one might imagine. It is difficult to decide whether to work more and risk losing a deadline (and thus some feedback) or to claim victory and submit whatever there is. Thankfully in research we are allowed to use the phrase “future work” for “incomplete work”, which makes it easier to do the latter. ↩
Deadlines wouldn’t have any magic if we set them months in advance! ↩
Frankly I am not sure if I am even a junior researcher yet. Evidently, I am junior. Also I am trying to be a researcher. So I suppose I can take this creative liberty. ↩
Which is more about calling off deadlines than meeting them. ↩

Universities

2024-07-18T00:00:00+00:00

I am always very excited about meeting students from other universities, especially undergrad students. And whenever I do meet students from other universities, I am most interested in listening about how things work at their school.

The undergrad program at IISc is very new. If you talk about the B.Tech. program in particular, that is as new as it can be - I am one of the students in the first batch of this program, and I am only a rising junior. This is amazing because we (i.e., the students of this first batch) get to carve out our paths and set benchmarks for years to come. But this is also horrifying because we get to carve out our paths and there is nobody to tell us “what” to do, “how” to do and “when” to do - in practice this is a gamble, but that is a story for another time¹. Also, we are very few - 46 people in my batch to be precise. This often makes B.Tech. an echo chamber. We see what is available, similar ideas and perspectives flow through all of our minds, and we think this is all that there is.

This is why meeting students from elsewhere is an opportunity for me. I ask about their coursework to see how things are different - sometimes different places follow very different approaches for delivering the same content, or simply deliver very different contents. I ask about their professors and other teaching faculty. I ask about their courseload. But most importantly, I ask what the students there are up to. Do they all want to grind day and night on Leetcode to get placed at some company as soon as possible? Are they easy going and mostly focus on coursework? Are they innovating with startup ideas? Are they interested in research and if they are, what are they doing to get research experience? What is the hottest topic in computer science among these students? What co-curricular activies like student clubs or competitions are they involved in? Of course I always have my side of the story to share too. Sometimes people are surprised by the diversity of things in IISc, and sometimes they are surprised by the narrowness of things in IISc. At the same time, I get to understand the challenges faces by students elsewhere, and how they overcome them. I get to see what gives them an edge, and how can we adopt some of those things in our student culture at IISc.

But there is more to student life than all these “nerdy” concerns. Hostel, mess, sports, fests, and the surrounding city are some of these. When I talk to my friends studying in Delhi, we also discuss how many people drink, smoke or take drugs at our schools². We may discuss how friendly, collaborative or competitive our peers are. Or maybe how beautiful the campus is, and how much of it is currently under construction. I can ask so many questions, but hearing about these things is one thing while actually living in that environment is another. I had the opportunity to do precisely that this summer at UIUC. Of course I could not get an authentic experience because the campus is in its best (or worst?) shape only during the semester. But I do see that living here is quite different from living inside the campus of an Indian univerity like IISc.

IISc’s campus is like a tiny town made just for its citizens. Students are guaranteed on-campus housing and are provided decent hostels where they have to practically do nothing for maintenance (besides maybe dusting their own room once in a while³) - at a very student-friendly price (i.e., nearly nothing). We are given decent food 4 times a day, and again at a very student-friendly price. The campus design makes it easy to walk and cycle, and there are essential stores and even affordable restaurants right inside. Of course we can go out into the city⁴ for more lavish options - but we can comfortably spend months and in principle, even years without ever leaving the campus. The campus is walled and very safe. I don’t remember walking around the campus at 3 or 4 in the morning and not feeling safe⁵. But at UIUC things were a bit different. I feel that students, particularly grad students, are much more on their own. Their is no hard-boundary for the campus, and it practically just fades into the neighbourhoods of Urbana and Champaign. It still seems safe because of the large number of students in the town, but frankly not as safe as my campus - and especially so at night. Many students live in apartments where they have to deal with actual property “leasing”. A big fraction of students cook for themselves, and many often eat out. There are many stores and restaurants in the area, but they are not usually made for students from the pricing point of view. You can walk to many places but some places are far enough to make walking unfeasible. There is a good⁶ bus service, but I think its not enough for the lifestyle here because many students buy (and frequently use) cars. All in all, I feel that I have had an easy life back home. Everything is small, nearby and affordable - and it seems that the university has a role in making my lifestyle easier. Of course there are caveats (again, lets make that a discussion for another time) but I think its great to get a taste of both of these lifestyles. If you ask me “Why is that great?” - I don’t know; that just sounded like a nice sentence to conclude this essay with.

But till then, I would like to clarify that I am not complaining and while we do have some huge responsibility on our shoulders, the administration in IISc has made huge investments to make things work for us. ↩
I assure the reader that my answers to these questions are somewhere between “None that I know of” and “Maybe a few, but they stay to themselves”. ↩
I doubt many of us do that though. ↩
Thankfully IISc is actually surrounded by the city, unlike some other universities. ↩
Although I do vividly remember walking around the campus at 3, 4, 5 or even 6 in the morning. ↩
Although not as good in the summers. ↩

Not My Problem. Nevermind, My Bad.

2024-06-19T00:00:00+00:00

I have lately been working with GraalVM to use Polyglot for embedding Python code in Java. In doing so I sometimes require a translation of Java objects to Python objects and vice versa, while passing them between the two languages. My research advisor recently pointed out that my translation will fall apart if he passed a self-referential object - like a list that contains itself. This is because I translate objects like lists by first creating a fresh list in Python and then recursively translating the elements of the list and appending them to the Python list. If encountered with a self-referential list, this process would lead to a RecursionError. However this is a classic problem and has a simple fix - we keep a track of objects that have already been translated and reuse translations if the object is encountered again.

For the specific case of translating lists, the original program would look something like this (note that this is not actual code).

def translate(java_object):
    # ...handle other kind of objects

    # handle lists
    if java_object.is_java_list():
        new_list = []
        for element in java_object:
            new_list.append(translate(element))
        return new_list

The fix would modify the code in the following way.

def translate(java_object, translated_objects=None):
    if translated_objects is None:
        translated_objects = {}
        
    if id(java_object) in translated_objects:
        return translated_objects[id(java_object)]

    # ...handle other kind of objects

    # handle lists
    if java_object.is_java_list():
        new_list = []
        translated_objects[id(java_object)] = new_list
        for element in java_object:
            new_list.append(translate(element, translated_objects))
        return new_list

Simple, right? Note that java_object is of foreign type when passed to Python. Since I was in a “debug mode” my actual function printed java_object before proceeding to the rest of the function body.

def translate(java_object, translated_objects=None):
    print(f"translating {java_object} with id {id(java_object)}")

    # ...rest of the function

Everything was good, until I passed a recursive list to the function - and all I saw was a RecursionError. Polyglot doesn’t provide a very detailed stacktrace for errors arising from Python, and all I saw was that the error came from translate. But print was not executed, like it usually would while working with other objects. I confirmed that the statement before invoking translate was being executed, and inferred that the problem must be in the internals of Polyglot - maybe Polyglot did not know how to pass recursive structures to Python (possibly because it could not “wrap” it up in the foreign type). GraalVM has limited documentation, and a PhD student who shares the office with me suggested that I check GraalPython’s source code to find some implementation details. The code was understandably (small team of researchers, new project, etc.) not very well documented, and I could not find anything useful. I told my advisor that this is not my problem and maybe we should raise a bug report with the GraalVM team (which I already did informally).

Satisfied, I went back to my next task. But while scrolling through the translate function, I realized to my horror that my code actually looks like this.

def translate(java_object, translated_objects=None):
    if translated_objects is None:
        translated_objects = {}
        
    if id(java_object) in translated_objects:
        return translated_objects[id(java_object)]

    # ...handle other kind of objects

    # handle lists
    if java_object.is_java_list():
        new_list = []
        for element in java_object:
            new_list.append(translate(element, translated_objects))

        # NOTICE THIS LINE ---------------------------v
        translated_objects[id(java_object)] = new_list
        # ---------------------------------------------
        
        return new_list

Yes, this would lead to a RecursionError because my inner objects would never see the translation of the outer list. I knew that even if this was the error, my print statement should have executed. But I nonetheless fixed the code and ran it again with mixed feelings of satisfaction and embarrassment. However the RecursionError persisted - and with the same observation. I was partly relieved that nobody would know about my bug, and that the actual error is still in the internals of Polyglot.

I called it a day and before making a commit, removed the print that I put when I started working in the morning. I ran all tests again (because why not?) and to my absolute surprise, everything compiled and passed! It turned out that the RecursonError was due to the print call itself! Java can print recursive lists by replacing self-references with a placeholder (this Collection), while Python does the same with [...]. However the __format__ (and even __str__ and __repr__ – although I don’t think all three of these would even have been implemented) method on foreign does not properly handle recursive structures and that is where the RecursionError was coming from (evaluation of the f-string invokes the __format__ method).

So I did find a bug in Polyglot, but not one that had anything to do (I am not going to print foreign Java lists in Python) with what I was doing. The bug in what I was doing was really mine, but I never really hit it (and nobody else would have known about it) - so was it really a bug? Let’s call it a minor typo. And of course I have fixed my bug report to the GraalVM team.

Intercepting Attribute Accesses in Python

2024-06-16T00:00:00+00:00

Python is a very flexible language, and provides a lot of “magic” for developers to do nearly anything they want with the language and also the objects in the language. One interesting feature is the ability to override the so-called “dunder” methods (methods with double underscores on either side) on classes in order to provide custom behaviour.

One such method is __getattribute__ which is called whenever an attribute is accessed on an object. The method has the following signature

def __getattribute__(obj: object, name: str) -> Any:

where obj is the object on which the attribute is being accessed, and name is the string containing the name of the attribute being accessed. Technically we can pass any object as the first argument when accessing this method statically, but when called on an instance of a class, the instance is passed as the first argument automatically. There are more details here - for example, if an attribute is not found on an instance, the __getattr__ method is called.

It may be not very common to override this method, but it can be useful in some cases. For example, to provide a “proxy” object that forwards attribute accesses to another object, or to provide a “lazy” object that computes the value of an attribute only when it is accessed, or to log accesses to attributes on an object, and so on. One way to do this is to override the __getattribute__ method in the following way:

class SomeClass:
    ...
    def __getattribute__(self, name):
        if some_condition:
            do_something()
        else:
            # default behaviour
            return object.__getattribute__(name)

Notice that we do not invoke __getattribute__ on self - that would cause infinite recursion. We could technically use any class (which has not overridden its __getattribute__ method!) in place of object in the above code since all python classes inherit from object.

However, there is a catch (otherwise why this post?).

Static Attributes

__getattribute__ is called only when an attribute is accessed from an instance of the class, but not if the attribute is accessed statically from the class itself. This means that we somehow need to invoke some __getattribute__ method when an attribute is accessed statically. A natural solution would be to make the class an instance of another class, and this other class would have the __getattribute__ method. This other class is called a metaclass in Python. This is very different from inheritance or subclassing, and is a entire topic in itself - and of course metaclasses have many other uses.

We will first define a ‘metaclass’ that has the __getattribute__ method.

class MyMetaClass(type):
    def __getattribute__(self, name):
        if some_condition:
            do_something()
        else:
            # default behaviour
            return object.__getattribute__(name)

We make the metaclass inherit from type, which is the most common way to define a metaclass. However this is not necessary and for some exotic use cases, one may have a different construction. Frankly, I will have to read more about metaclasses to understand and explain this better.

Either way, we now we make our class an instance of this metaclass.

class SomeClass(metaclass=MyMetaClass):
    ...

This was the first time I was introduced to the concept and the need for metaclasses. What I was exactly trying to do is a theme for another post.

A Bit On Research

2024-06-15T00:00:00+00:00

I have been hearing from many experienced researchers that research is not like coursework and is by its very nature open-ended. There is no fixed path, and even your supervisor may not have concrete answers. In fact most likely, nobody in the entire world may have the answers to your questions. If you’re doing “cutting-edge” research, then this will happen at some point or the other - but it will happen. I have listened to all these insights and have wondered how exciting research must be - come on, we are talking of discovering new things here!

But my oh my, actually doing these things is a whole different thing. For the first time in my experience with research, I have found actual gaps in the literature. And it turns out that while research is hard - sometimes very hard - researchers have the liberty to sometimes take the easy way out. And I think this incremental nature of research is one of the hardest concepts to fully absorb for people who are new to research. Of course we build upon the work of others and leave our work for others to build upon - but what is not clear as a beginner is how much to build upon, how much to actually build, and how much to leave for others.

Research work is usually of a well-defined size. To me, this just seems like an “industry-standard” - possibly regulated by conference deadlines. So called “full papers” (generally in the research track of a conference) are long and detailed, but of course they are not all of knowledge on the topic. Deciding how much to include in a paper seems to me like an art. Trying to go for the low-hanging fruits may be one strategy, but then your work will not be deep enough and your paper will likely be rejected (or you might submit to a smaller track, which is not necessarily a bad thing - but its also not the same thing and as a graduate student or a professor, it turns out that you cannot live off short papers). But at the same time, you can easily hop on to flashy new ideas and there is not much to lose if you fail because you only put so much effort into it. On the other hand, supposedly real research means “deep” results - but how deep is deep enough? To me this sometimes looks like a rabbit hole. I can keep digging and digging and at every step it will seem as if just a little more to complete the picture. When do I exit and call the rest “future work”? I don’t want to stop too early and be called shallow, particularly if I know what the next step is. But I also don’t want to keep digging and digging and never get to the end of the tunnel. Of course there will be an end - but I don’t think it will be aligned to “deadlines”. There might also be challenges that may involve me developing deeper engineering skills or other auxiliary knowledge - but how much before I become a jack of all trades, master of none?

I don’t think there are clear answers to these questions (in the spirit of this post, I should say that these questions are just as open-ended) and I suspect that these tread into the “meta-level” of research which only the experienced researchers have a taste of. In fact I have come to realise that a research supervisor is not just a great source of technical knowledge, but also of this meta-level wisdom.

Note that my research is from software engineering and so my perspective is limited to this field.

Our Narrow Slice of Reality

2024-05-24T00:00:00+00:00

While preparing for my visit to the University of Illinois at Urbana-Champaign - my first time travelling to the United States - I remembered the first image of America that I had in my head when I was around four years old. The image was of a vast hilly landscape with lush greenery and neat concrete roads. These roads would be lined with several vendors selling clothes, toys and chocolates on wooden carts. A bunch of tall and well-built children wearing oversized sweaters and jackets would be the customers at these stalls.

This description sounds weird if not funny, but it was accurate to the best of my knowledge at that time - it was a fusion of the things I had seen around me, the piece of information I had gathered from my mother, and the ways in which unknown lands were portrayed in the stories I had listened to.

I remember my mother telling me tid-bits about the country whenever my relatives visited from there. These relatives would often bring me clothes, toys, and chocolates among other items. The toys would be fun and the chocolates would be delicious, but the clothes, albeit labelled with my correct age, would sometimes be a size too big - and I would have to grow into them. My mother would tell me that people in the United States were bigger and so American kids my age would actually wear bigger clothes. These were often winter clothes, and I had heard from my mother that is snowed in the United States.

In some sense, I had created my own America. This America never existed and (at least in the foreseeable future) never will. But then what really is America? No description is complete or truly accurate - so are they simply different Americas? In all fairness, every person has a different perspective that may or may not be adequately expressible in words. It seems that reality is just an agreement between all of our different perspectives - it is what most of us think it is. This may be easy to imagine for concrete objects like places and people, but for many abstract concepts we call these perspectives opinions.

Implementing the Multiclass SVM with CVXOPT

2024-05-02T00:00:00+00:00

CVXOPT is a popular Python library for convex optimization, which forms the basis of Support Vector Machines (SVMs). However converting mathematical expressions into the format required by CVXOPT can be non-trivial even for a binary-class SVM. Recently, I had to implement a multiclass SVM using CVXOPT for my Machine Learning course at the Indian Institute of Science. In this post, I will share the formulation and code that I used to implement the multiclass SVM. I am aware of alternative techniques such as one-vs-all and one-vs-one which utilize binary-class SVMs for multiclass classification, but I was required to implement the multiclass SVM directly during my course and that is what we will be discussing here. The direct formulation provides better performance and is less sensitive to issues related to class imbalance, but I haven’t yet looked into whether it is really a better option in practice. The reason for me to write this post is that there are very few resources available online for the direct formulation, because it seems that most people prefer the one-vs-all or one-vs-one approaches.

This formulation is based on On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines by Kolby Crammer and Yoram Singer (2002) in the Journal of Machine Learning Research. The first section of this post might seem a bit mathematical, but I have tried to keep it as simple and explanatory as possible. The implementation is in the second section.

Problem Formulation

The formulation of the multiclass SVM problem is very similar to the binary-class SVM, except that we now have multiple hyperplanes given by the vectors $w_1, w_2, \ldots, w_K$ - one for each class. The classifier, $f$ is then given by the hyperplane that maximizes the positive margin for a given input $x$.

\[f(x) = \arg \max_{k \in [K]} w_k^T x\]

Suppose that our training data is given by $\mathcal{D} = \{(x_i, y_i)\}_{i=1}^n$. Then the multiclass SVM problem can be formulated as follows.

\[\min_{w_1, \ldots, w_K, \xi} \frac{1}{2} \sum_{k=1}^K ||w_k||^2 + C \sum_{i=1}^n \xi_i\]

Here $\xi_i$ are slack variables, and $C$ is a hyperparameter. Let $\delta$ denote the Kronecker delta function. Then the constraints $w_{y_i}^T x_i - w_k^T x_i \ge 1 - \xi_i$ for all $i \in [n]$ and $k \in [K] \setminus \{y_i\}$ and $\xi_i \ge 0$ for all $i \in [n]$ can be written succinctly as

\[w_{y_i}^T x_i - w_k^T x_i + \delta_{y_i,k} \ge 1 - \xi_i \quad \forall i \in [n], k \in [K]\]

We first write the Lagrangian for the primal problem, by introducing the dual variables $\alpha = (\alpha_1, \ldots, \alpha_n)$. Let $w$ denote $(w_1, \ldots, w_K)$.

\[\begin{align*} \mathcal{L}(w, \xi, \alpha) &= \frac{1}{2} \sum_{k=1}^K \|w_k\|^2 + C \sum_{i=1}^n \xi_i + \sum_{i=1}^n \sum_{k=1}^K \alpha_{ik} (1 - \xi_i + (w_k - w_{y_i})^T x_i - \delta_{y_i,k}) \end{align*}\]

Here, $\alpha_{ij} \ge 0$ for all $i \in [n]$ and $j \in [K]$.

Taking the partial derivative with respect to $\xi_i$ and setting it to zero, we get that for all $i \in [n]$,

\[\begin{align*} \frac{\partial \mathcal{L}}{\partial \xi_i} &= C - \sum_{k=1}^K \alpha_{ik} = 0 \\ C &= \sum_{k=1}^K \alpha_{ik} \end{align*}\]

Now, taking the gradient with respect to $w_j$ and setting it to zero, we get that for any $j \in [K]$,

\[\begin{align*} \nabla_{w_j} \mathcal{L} &= w_j - \sum_{y_i = j} \sum_{\substack{k=1 \\ k\neq j}}^K \alpha_{ik} x_i + \sum_{y_i \neq j} \alpha_{ij} x_i = 0 \\ 0 &= w_j - \sum_{y_i = j} \sum_{k=1}^K \alpha_{ik} x_i + \sum_{i = 1}^n \alpha_{ij} x_i \end{align*}\] \[\begin{align*} w_j &= \sum_{y_i = j} \sum_{k=1}^K \alpha_{ik} x_i - \sum_{i=1}^n \alpha_{ij} x_i = \sum_{y_i = j} x_i \left( \sum_{k=1}^K \alpha_{ik}\right) - \sum_{i=1}^n \alpha_{ij} x_i \\ &= C \sum_{y_i = j} x_i - \sum_{i=1}^n \alpha_{ij} x_i = C \sum_{i=1}^n \delta_{y_i,j} x_i - \sum_{i=1}^n \alpha_{ij} x_i \\ w_j &= \sum_{i=1}^n (C \delta_{y_i,j} - \alpha_{ij}) x_i \end{align*}\]

Substituting these values back into the Lagrangian, we get

\[\begin{align*} \mathcal{L} &= \frac{1}{2} \sum_{k=1}^K \sum_{i, j = 1}^n (C \delta_{y_i,k} - \alpha_{ik}) (C \delta_{y_j,k} - \alpha_{jk}) x_i^T x_j + \sum_{i=1}^n \xi_i\left(C - \sum_{k=1}^K \alpha_{ik}\right) \\ &+ \sum_{i=1}^n \sum_{k=1}^K \alpha_{ik} + \sum_{i=1}^n \sum_{k=1}^K \alpha_{ik} (w_k - w_{y_i})^T x_i - \sum_{i=1}^n \sum_{k=1}^K \alpha_{ik} \delta_{y_i,k} \\ &= \frac{1}{2} \sum_{k=1}^K \sum_{i, j = 1}^n (C \delta_{y_i,k} - \alpha_{ik}) (C \delta_{y_j,k} - \alpha_{jk}) x_i^T x_j + nC + \sum_{i=1}^n \sum_{k=1}^K \alpha_{ik} (w_k - w_{y_i})^T x_i \\ &- \sum_{i=1}^n \sum_{k=1}^K \alpha_{ik} \delta_{y_i,k} \end{align*}\]

Note that

\[\begin{align*} \sum_{i=1}^n \sum_{k=1}^K \alpha_{ik} w_k^T x_i &= \sum_{i=1}^n \sum_{k=1}^K \alpha_{ik} \left( \sum_{j=1}^n (C \delta_{y_j,k} - \alpha_{jk}) x_j \right)^T x_i \\ &= \sum_{i, j = 1}^n x_j^T x_i \left( \sum_{k=1}^K \alpha_{ik} (C \delta_{y_j,k} - \alpha_{jk}) \right) \\ \sum_{i=1}^n \sum_{k=1}^K \alpha_{ik} w_{y_i}^T x_i &= \sum_{i=1}^n \sum_{k=1}^K \alpha_{ik} \left( \sum_{j=1}^n (C \delta_{y_j,y_i} - \alpha_{jy_i}) x_j \right)^T x_i \\ &= \sum_{i, j = 1}^n x_j^T x_i \left( \sum_{k=1}^K \alpha_{ik} (C \delta_{y_j,y_i} - \alpha_{jy_i}) \right) \\ &= \sum_{i, j=1}^n x_i^T x_i (C (C \delta_{y_i,y_i} - \alpha_{jy_i})) \\ &= \sum_{i, j=1}^n x_i^T x_i \left( \sum_{k=1}^K C\delta_{y_i,k}(C\delta_{y_i,k} - \alpha_{jk}) \right) \end{align*}\]

and therefore,

\[\begin{align*} \mathcal{L} &= \frac{1}{2} \sum_{k=1}^K \sum_{i, j = 1}^n (C \delta_{y_i,k} - \alpha_{ik}) (C \delta_{y_j,k} - \alpha_{jk}) x_i^T x_j + nC \\ &- \sum_{i, j=1}^n x_i^T x_j \left( \sum_{k=1}^K (C \delta_{y_i,k} - \alpha_{ik}) (C \delta_{y_j,k} - \alpha_{jk}) \right) - \sum_{i=1}^n \sum_{k=1}^K \alpha_{ik} \delta_{y_i,k} \\ &= -\frac{1}{2} \sum_{k=1}^K \sum_{i, j = 1}^n (C \delta_{y_i,k} - \alpha_{ik}) (C \delta_{y_j,k} - \alpha_{jk}) x_i^T x_j + nC - \sum_{i=1}^n \sum_{k=1}^K \alpha_{ik} \delta_{y_i,k} \end{align*}\]

Let $e_i = (\delta_{i,1}, \ldots, \delta_{i,K})$ for any $i \in [K]$, and let $e = (1, 1, \ldots, 1) \in \mathbb{R}^K$. Then the above equation can be written as

\[\begin{align*} \mathcal{L} &= -\frac{1}{2} \sum_{i, j = 1}^n (Ce_{y_i} - \alpha_i)^T (Ce_{y_j} - \alpha_j) x_i^T x_j + nC - \sum_{i=1}^n \alpha_i^T e_{y_i} \end{align*}\]

Therefore, the dual problem is

\[\begin{align*} \max_{\alpha} &\quad -\frac{1}{2} \sum_{i, j = 1}^n (Ce_{y_i} - \alpha_i)^T (Ce_{y_j} - \alpha_j) x_i^T x_j - \sum_{i=1}^n \alpha_i^T e_{y_i} + nC\\ \text{or} \quad \min_{\alpha} &\quad \frac{1}{2} \sum_{i, j = 1}^n (Ce_{y_i} - \alpha_i)^T (Ce_{y_j} - \alpha_j) x_i^T x_j + \sum_{i=1}^n \alpha_i^T e_{y_i} - nC \\ \text{such that} &\quad \alpha_{ij} \ge 0 \quad \forall i \in [n], j \in [K], \quad \alpha_i^T e = C \quad \forall i \in [n] \end{align*}\]

Let $\lambda_i = Ce_{y_i} - \alpha_i$ for all $i \in [n]$. Note that

\[\begin{align*} \sum_{i=1}^n \alpha_i^T e_{y_i} &= \sum_{i=1}^n (Ce_{y_i} - \lambda_i)^T e_{y_i} = \sum_{i=1}^n Ce_{y_i}^T e_{y_i} - \sum_{i=1}^n \lambda_i^T e_{y_i} \\ &= nC - \sum_{i=1}^n \lambda_i^T e_{y_i} \end{align*}\]

Since $\alpha_i \ge 0$ for all $i \in [n]$, we have that $\lambda_i \le Ce_{y_i}$. Similarly, since $\alpha_i^T e = C$ for all $i \in [n]$, we have that $\lambda_i^T e = 0$. Substituting the values of $\lambda_i$, the dual problem can be written as

\[\begin{align*} \min_{\lambda} &\quad \frac{1}{2} \sum_{i, j = 1}^n (\lambda_i^T \lambda_j) (x_i^T x_j) - \sum_{i=1}^n \lambda_i^T e_{y_i} \\ \text{such that} &\quad \lambda_i \le Ce_{y_i}, \quad \lambda_i^T e = 0 \quad \forall i \in [n] \end{align*}\]

Note that on solving the dual problem, we get the values of $w_j$ as

\[\begin{align*} w_j &= \sum_{i=1}^n \lambda_{ij} x_i \end{align*}\]

for all $j \in [K]$.

This final expression is much simpler than not only the original primal problem, but also most formulations of the binary-class SVM.

Implementation with CVXOPT

CVXOPT provides a simple interface for quadratic programming. We will convert the dual problem into the standard form required by CVXOPT and feed it to the solver. We want a problem of the form

\[\begin{align*} \min_{x} &\quad \frac{1}{2} x^T P x + q^T x \\ \text{such that} &\quad Gx \preccurlyeq h, \quad Ax = b \end{align*}\]

I am providing the $P, q, G, h, A, b$ matrices below and will leave the proof of correctness as an exercise for the reader (it is a simple exercise in linear algebra). Note that I have replaced the inner product $x_i^T x_j$ with the kernel $k(x_i, x_j)$.

\[\begin{align*} P &= \begin{bmatrix} k(x_1, x_1) I_{K} & \cdots & k(x_1, x_n) I_{K} \\ \vdots & \ddots & \vdots \\ k(x_n, x_1) I_{K} & \cdots & k(x_n, x_n) I_{K} \end{bmatrix} \\ q &= \begin{bmatrix} -e_{y_1} \\ \vdots \\ -e_{y_n} \end{bmatrix} \\ G &= -I_{nK} \\ h &= C \begin{bmatrix} e_{y_1} \\ \vdots \\ e_{y_n} \end{bmatrix} \\ A &= \begin{bmatrix} 1_{1 \times K} & 0_{1 \times K} & \cdots & 0_{1 \times K} \\ 0_{1 \times K} & 1_{1 \times K} & \cdots & 0_{1 \times K} \\ \vdots & \vdots & \ddots & \vdots \\ 0_{1 \times K} & 0_{1 \times K} & \cdots & 1_{1 \times K} \end{bmatrix} \\ b &= 0_{n \times 1} \end{align*}\]

The implementation of our Mutliclass SVM is as follows.

import numpy as np

class MultiClassSVM:
    """
    A Support Vector Machine for Multi-Class Classification
    """

    def __init__(self, labels: np.ndarray, train_x: np.ndarray, train_y: np.ndarray):
        """
        Constructor for the MultiClassSVM class that initializes the training data and labels
        :param labels: the labels of the classes
        :param train_x: the training data
        :param train_y: the training labels
        """
        self.labels = labels
        self.train_x = train_x
        self.train_y = train_y

    def train(
        self,
        kernel: Callable = Kernels.linear(),
        slack_weight: float = 0,
    ) -> None:
        """
        Train the Multi-Class SVM
        :param kernel: the kernel function to use if provided
        :param slack_weight: the weight of the slack variables (0 for hard margin)
        """
        n = self.train_x.shape[0]
        k = len(self.labels)

        e = np.zeros(n * k)
        for i in range(n):
            e[i * k + list(self.labels).index(self.train_y[i])] = 1

        P = np.zeros((n * k, n * k))
        for i in range(n):
            for j in range(n):
                P[i * k : (i + 1) * k, j * k : (j + 1) * k] = (
                    kernel(self.train_x[i], self.train_x[j]) + 1
                ) * np.eye(k)

        q = -e

        G = np.eye(n * k)
        h = slack_weight * e

        A = np.zeros((n, n * k))
        for i in range(n):
            A[i, i * k : (i + 1) * k] = 1

        b = np.zeros(n)

        sol = solvers.qp(
            matrix(P),
            matrix(q),
            matrix(G),
            matrix(h),
            matrix(A, (n, n * k), "d"),
            matrix(b),
        )

        # construct the weight vectors
        W = [None for _ in range(k)]
        for j in range(k):
            W[j] = lambda x, j=j: sum(
                sol["x"][i * k + j] * (kernel(x, self.train_x[i]) + 1)
                for i in range(n)
            )

        # construct the classifier
        def classifier(x):
            res = [W[j](x) for j in range(k)]
            return self.labels[res.index(max(res))]

        return Classifier(classifier)

The Kernel functions can be easily swapped in and out by passing them as arguments to the train function. An example implementation of the Kernels class is as follows.

class Kernels:
    """
    A class for different kernel functions
    """

    @staticmethod
    def rbf(sigma: float) -> float:
        """
        Radial Basis Function (RBF) kernel
        :param sigma: the width of the kernel
        :return: the kernel function
        """
        return lambda x, y: np.exp(-np.linalg.norm(x - y) ** 2 / (2 * sigma**2))

    @staticmethod
    def linear() -> float:
        """
        Linear kernel
        :return: the kernel function
        """
        return lambda x, y: x @ y

The Classifier class simply provides a wrapper around the classifier function.

class Classifier:
    """
    A class for a classifier generated by the SVM
    """

    def __init__(self, classifier: Callable):
        """
        Constructor for the Classifier class
        :param classifier: the classifier function
        """
        self.classifier = classifier

    def predict(self, x: np.ndarray) -> float:
        """
        Predict the class of a data point
        :param x: the data point
        :return: the predicted class
        """
        return self.classifier(x)

    def test(self, test_x: np.ndarray, test_y: np.ndarray) -> float:
        """
        Test the classifier on a test set
        :param test_x: the test data
        :param test_y: the test labels
        :return: the accuracy of the classifier
        """
        correct = 0
        for i in range(test_x.shape[0]):
            if self.predict(test_x[i]) == test_y[i]:
                correct += 1
        return correct / test_x.shape[0]

Mirrors

2024-03-15T00:00:00+00:00

I had been staring into the mirror
and I didn’t seem quite right.
I supposed I need to move nearer
or maybe just turn on the tubelight.

I realised the mirror is stained
and so I better wipe it clean.
Those spots however seemed ingrained
and so I replaced it with one that was pristine.

But now the mirror appeared warped
as the reflection was mine but not really.
“My nose looks a bit too long,” I harped
and replaced it again, though now wearily.

And just like that, it so happened
that every mirror I brought was flawed.
Mirror after mirror, I was rather maddended
but who knew I was hitting the wrong chord.

The distortions are my own
from the stains to the contortions.
Afterall mirrors are just shiny stone
that occasionally teach me a little precaution.