The undergrad program at IISc is very new. If you talk about the B.Tech. program in particular, that is as new as it can be - I am one of the students in the first batch of this program, and I am only a rising junior. This is amazing because we (i.e., the students of this first batch) get to carve out our paths and set benchmarks for years to come. But this is also horrifying because we get to carve out our paths and there is nobody to tell us “what” to do, “how” to do and “when” to do - in practice this is a gamble, but that is a story for another time^{1}. Also, we are very few - 46 people in my batch to be precise. This often makes B.Tech. an echo chamber. We see what is available, similar ideas and perspectives flow through all of our minds, and we think this is all that there is.
This is why meeting students from elsewhere is an opportunity for me. I ask about their coursework to see how things are different - sometimes different places follow very different approaches for delivering the same content, or simply deliver very different contents. I ask about their professors and other teaching faculty. I ask about their courseload. But most importantly, I ask what the students there are up to. Do they all want to grind day and night on Leetcode to get placed at some company as soon as possible? Are they easy going and mostly focus on coursework? Are they innovating with startup ideas? Are they interested in research and if they are, what are they doing to get research experience? What is the hottest topic in computer science among these students? What co-curricular activies like student clubs or competitions are they involved in? Of course I always have my side of the story to share too. Sometimes people are surprised by the diversity of things in IISc, and sometimes they are surprised by the narrowness of things in IISc. At the same time, I get to understand the challenges faces by students elsewhere, and how they overcome them. I get to see what gives them an edge, and how can we adopt some of those things in our student culture at IISc.
But there is more to student life than all these “nerdy” concerns. Hostel, mess, sports, fests, and the surrounding city are some of these. When I talk to my friends studying in Delhi, we also discuss how many people drink, smoke or take drugs at our schools^{2}. We may discuss how friendly, collaborative or competitive our peers are. Or maybe how beautiful the campus is, and how much of it is currently under construction. I can ask so many questions, but hearing about these things is one thing while actually living in that environment is another. I had the opportunity to do precisely that this summer at UIUC. Of course I could not get an authentic experience because the campus is in its best (or worst?) shape only during the semester. But I do see that living here is quite different from living inside the campus of an Indian univerity like IISc.
IISc’s campus is like a tiny town made just for its citizens. Students are guaranteed on-campus housing and are provided decent hostels where they have to practically do nothing for maintenance (besides maybe dusting their own room once in a while^{3}) - at a very student-friendly price (i.e., nearly nothing). We are given decent food 4 times a day, and again at a very student-friendly price. The campus design makes it easy to walk and cycle, and there are essential stores and even affordable restaurants right inside. Of course we can go out into the city^{4} for more lavish options - but we can comfortably spend months and in principle, even years without ever leaving the campus. The campus is walled and very safe. I don’t remember walking around the campus at 3 or 4 in the morning and not feeling safe^{5}. But at UIUC things were a bit different. I feel that students, particularly grad students, are much more on their own. Their is no hard-boundary for the campus, and it practically just fades into the neighbourhoods of Urbana and Champaign. It still seems safe because of the large number of students in the town, but frankly not as safe as my campus - and especially so at night. Many students live in apartments where they have to deal with actual property “leasing”. A big fraction of students cook for themselves, and many often eat out. There are many stores and restaurants in the area, but they are not usually made for students from the pricing point of view. You can walk to many places but some places are far enough to make walking unfeasible. There is a good^{6} bus service, but I think its not enough for the lifestyle here because many students buy (and frequently use) cars. All in all, I feel that I have had an easy life back home. Everything is small, nearby and affordable - and it seems that the university has a role in making my lifestyle easier. Of course there are caveats (again, lets make that a discussion for another time) but I think its great to get a taste of both of these lifestyles. If you ask me “Why is that great?” - I don’t know; that just sounded like a nice setence to conclude this essay with.
But till then, I would like to clarify that I am not complaining and while we do have some huge responsibility on our shoulders, the administration in IISc has made huge investments to make things work for us. ↩
I assure the reader that my answers to these questions are somewhere between “None that I know of” and “Maybe a few, but they stay to themselves”. ↩
I doubt many of us do that though. ↩
Thankfully IISc is actually surrounded by the city, unlike some other universities. ↩
Although I do vividly remember walking around the campus at 3, 4, 5 or even 6 in the morning. ↩
Although not as good in the summers. ↩
RecursionError
. However this is a classic problem and has a simple fix - we keep a track of objects that have already been translated and reuse translations if the object is encountered again.
For the specific case of translating lists, the original program would look something like this (note that this is not actual code).
def translate(java_object):
# ...handle other kind of objects
# handle lists
if java_object.is_java_list():
new_list = []
for element in java_object:
new_list.append(translate(element))
return new_list
The fix would modify the code in the following way.
def translate(java_object, translated_objects=None):
if translated_objects is None:
translated_objects = {}
if id(java_object) in translated_objects:
return translated_objects[id(java_object)]
# ...handle other kind of objects
# handle lists
if java_object.is_java_list():
new_list = []
translated_objects[id(java_object)] = new_list
for element in java_object:
new_list.append(translate(element, translated_objects))
return new_list
Simple, right? Note that java_object
is of foreign
type when passed to Python. Since I was in a “debug mode” my actual function printed java_object
before proceeding to the rest of the function body.
def translate(java_object, translated_objects=None):
print(f"translating {java_object} with id {id(java_object)}")
# ...rest of the function
Everything was good, until I passed a recursive list to the function - and all I saw was a RecursionError
. Polyglot doesn’t provide a very detailed stacktrace for errors arising from Python, and all I saw was that the error came from translate
. But print
was not executed, like it usually would while working with other objects. I confirmed that the statement before invoking translate
was being executed, and inferred that the problem must be in the internals of Polyglot - maybe Polyglot did not know how to pass recursive structures to Python (possibly because it could not “wrap” it up in the foreign
type). GraalVM has limited documentation, and a PhD student who shares the office with me suggested that I check GraalPython’s source code to find some implementation details. The code was understandably (small team of researchers, new project, etc.) not very well documented, and I could not find anything useful. I told my advisor that this is not my problem and maybe we should raise a bug report with the GraalVM team (which I already did informally).
Satisfied, I went back to my next task. But while scrolling through the translate
function, I realized to my horror that my code actually looks like this.
def translate(java_object, translated_objects=None):
if translated_objects is None:
translated_objects = {}
if id(java_object) in translated_objects:
return translated_objects[id(java_object)]
# ...handle other kind of objects
# handle lists
if java_object.is_java_list():
new_list = []
for element in java_object:
new_list.append(translate(element, translated_objects))
# NOTICE THIS LINE ---------------------------v
translated_objects[id(java_object)] = new_list
# ---------------------------------------------
return new_list
Yes, this would lead to a RecursionError
because my inner objects would never see the translation of the outer list. I knew that even if this was the error, my print
statement should have executed. But I nonetheless fixed the code and ran it again with mixed feelings of satisfaction and embarrassment. However the RecursionError
persisted - and with the same observation. I was partly relieved that nobody would know about my bug, and that the actual error is still in the internals of Polyglot.
I called it a day and before making a commit, removed the print
that I put when I started working in the morning. I ran all tests again (because why not?) and to my absolute surprise, everything compiled and passed! It turned out that the RecursonError
was due to the print
call itself! Java can print recursive lists by replacing self-references with a placeholder (this Collection)
, while Python does the same with [...]
. However the __format__
(and even __str__
and __repr__
– although I don’t think all three of these would even have been implemented) method on foreign
does not properly handle recursive structures and that is where the RecursionError
was coming from (evaluation of the f-string invokes the __format__
method).
So I did find a bug in Polyglot, but not one that had anything to do (I am not going to print foreign
Java lists in Python) with what I was doing. The bug in what I was doing was really mine, but I never really hit it (and nobody else would have known about it) - so was it really a bug? Let’s call it a minor typo. And of course I have fixed my bug report to the GraalVM team.
One such method is __getattribute__
which is called whenever an attribute is accessed on an object. The method has the following signature
def __getattribute__(obj: object, name: str) -> Any:
where obj
is the object on which the attribute is being accessed, and name
is the string containing the name of the attribute being accessed. Technically we can pass any object as the first argument when accessing this method statically, but when called on an instance of a class, the instance is passed as the first argument automatically. There are more details here - for example, if an attribute is not found on an instance, the __getattr__
method is called.
It may be not very common to override this method, but it can be useful in some cases. For example, to provide a “proxy” object that forwards attribute accesses to another object, or to provide a “lazy” object that computes the value of an attribute only when it is accessed, or to log accesses to attributes on an object, and so on. One way to do this is to override the __getattribute__
method in the following way:
class SomeClass:
...
def __getattribute__(self, name):
if some_condition:
do_something()
else:
# default behaviour
return object.__getattribute__(name)
Notice that we do not invoke __getattribute__
on self
- that would cause infinite recursion. We could technically use any class (which has not overridden its __getattribute__
method!) in place of object
in the above code since all python classes inherit from object
.
However, there is a catch (otherwise why this post?).
__getattribute__
is called only when an attribute is accessed from an instance of the class, but not if the attribute is accessed statically from the class itself. This means that we somehow need to invoke some __getattribute__
method when an attribute is accessed statically. A natural solution would be to make the class an instance of another class, and this other class would have the __getattribute__
method. This other class is called a metaclass in Python. This is very different from inheritance or subclassing, and is a entire topic in itself - and of course metaclasses have many other uses.
We will first define a ‘metaclass’ that has the __getattribute__
method.
class MyMetaClass(type):
def __getattribute__(self, name):
if some_condition:
do_something()
else:
# default behaviour
return object.__getattribute__(name)
We make the metaclass inherit from type
, which is the most common way to define a metaclass. However this is not necessary and for some exotic use cases, one may have a different construction. Frankly, I will have to read more about metaclasses to understand and explain this better.
Either way, we now we make our class an instance of this metaclass.
class SomeClass(metaclass=MyMetaClass):
...
This was the first time I was introduced to the concept and the need for metaclasses. What I was exactly trying to do is a theme for another post.
]]>But my oh my, actually doing these things is a whole different thing. For the first time in my experience with research, I have found actual gaps in the literature. And it turns out that while research is hard - sometimes very hard - researchers have the liberty to sometimes take the easy way out. And I think this incremental nature of research is one of the hardest concepts to fully absorb for people who are new to research. Of course we build upon the work of others and leave our work for others to build upon - but what is not clear as a beginner is how much to build upon, how much to actually build, and how much to leave for others.
Research work is usually of a well-defined size. To me, this just seems like an “industry-standard” - possibly regulated by conference deadlines. So called “full papers” (generally in the research track of a conference) are long and detailed, but of course they are not all of knowledge on the topic. Deciding how much to include in a paper seems to me like an art. Trying to go for the low-hanging fruits may be one strategy, but then your work will not be deep enough and your paper will likely be rejected (or you might submit to a smaller track, which is not necessarily a bad thing - but its also not the same thing and as a graduate student or a professor, it turns out that you cannot live off short papers). But at the same time, you can easily hop on to flashy new ideas and there is not much to lose if you fail because you only put so much effort into it. On the other hand, supposedly real research means “deep” results - but how deep is deep enough? To me this sometimes looks like a rabbit hole. I can keep digging and digging and at every step it will seem as if just a little more to complete the picture. When do I exit and call the rest “future work”? I don’t want to stop too early and be called shallow, particularly if I know what the next step is. But I also don’t want to keep digging and digging and never get to the end of the tunnel. Of course there will be an end - but I don’t think it will be aligned to “deadlines”. There might also be challenges that may involve me developing deeper engineering skills or other auxiliary knowledge - but how much before I become a jack of all trades, master of none?
I don’t think there are clear answers to these questions (in the spirit of this post, I should say that these questions are just as open-ended) and I suspect that these tread into the “meta-level” of research which only the experienced researchers have a taste of. In fact I have come to realise that a research supervisor is not just a great source of technical knowledge, but also of this meta-level wisdom.
Note that my research is from software engineering and so my perspective is limited to this field.
]]>This description sounds weird if not funny, but it was accurate to the best of my knowledge at that time - it was a fusion of the things I had seen around me, the piece of information I had gathered from my mother, and the ways in which unknown lands were portrayed in the stories I had listened to.
I remember my mother telling me tid-bits about the country whenever my relatives visited from there. These relatives would often bring me clothes, toys, and chocolates among other items. The toys would be fun and the chocolates would be delicious, but the clothes, albeit labelled with my correct age, would sometimes be a size too big - and I would have to grow into them. My mother would tell me that people in the United States were bigger and so American kids my age would actually wear bigger clothes. These were often winter clothes, and I had heard from my mother that is snowed in the United States.
In some sense, I had created my own America. This America never existed and (at least in the foreseeable future) never will. But then what really is America? No description is complete or truly accurate - so are they simply different Americas? In all fairness, every person has a different perspective that may or may not be adequately expressible in words. It seems that reality is just an agreement between all of our different perspectives - it is what most of us think it is. This may be easy to imagine for concrete objects like places and people, but for many abstract concepts we call these perspectives opinions.
]]>This formulation is based on On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines by Kolby Crammer and Yoram Singer (2002) in the Journal of Machine Learning Research. The first section of this post might seem a bit mathematical, but I have tried to keep it as simple and explanatory as possible. The implementation is in the second section.
The formulation of the multiclass SVM problem is very similar to the binary-class SVM, except that we now have multiple hyperplanes given by the vectors \(w_1, w_2, \ldots, w_K\) - one for each class. The classifier, \(f\) is then given by the hyperplane that maximizes the positive margin for a given input \(x\).
\[f(x) = \arg \max_{k \in [K]} w_k^T x\]Suppose that our training data is given by \(\mathcal{D} = \{(x_i, y_i)\}_{i=1}^n\). Then the multiclass SVM problem can be formulated as follows.
\[\min_{w_1, \ldots, w_K, \xi} \frac{1}{2} \sum_{k=1}^K ||w_k||^2 + C \sum_{i=1}^n \xi_i\]Here \(\xi_i\) are slack variables, and \(C\) is a hyperparameter. Let \(\delta\) denote the Kronecker delta function. Then the constraints \(w_{y_i}^T x_i - w_k^T x_i \ge 1 - \xi_i\) for all \(i \in [n]\) and \(k \in [K] \setminus \{y_i\}\) and \(\xi_i \ge 0\) for all \(i \in [n]\) can be written succinctly as
\[w_{y_i}^T x_i - w_k^T x_i + \delta_{y_i,k} \ge 1 - \xi_i \quad \forall i \in [n], k \in [K]\]We first write the Lagrangian for the primal problem, by introducing the dual variables $\alpha = (\alpha_1, \ldots, \alpha_n)$. Let $w$ denote $(w_1, \ldots, w_K)$.
\[\begin{align*} \mathcal{L}(w, \xi, \alpha) &= \frac{1}{2} \sum_{k=1}^K \|w_k\|^2 + C \sum_{i=1}^n \xi_i + \sum_{i=1}^n \sum_{k=1}^K \alpha_{ik} (1 - \xi_i + (w_k - w_{y_i})^T x_i - \delta_{y_i,k}) \end{align*}\]Here, $\alpha_{ij} \ge 0$ for all $i \in [n]$ and $j \in [K]$.
Taking the partial derivative with respect to $\xi_i$ and setting it to zero, we get that for all $i \in [n]$,
\[\begin{align*} \frac{\partial \mathcal{L}}{\partial \xi_i} &= C - \sum_{k=1}^K \alpha_{ik} = 0 \\ C &= \sum_{k=1}^K \alpha_{ik} \end{align*}\]Now, taking the gradient with respect to $w_j$ and setting it to zero, we get that for any $j \in [K]$,
\[\begin{align*} \nabla_{w_j} \mathcal{L} &= w_j - \sum_{y_i = j} \sum_{\substack{k=1 \\ k\neq j}}^K \alpha_{ik} x_i + \sum_{y_i \neq j} \alpha_{ij} x_i = 0 \\ 0 &= w_j - \sum_{y_i = j} \sum_{k=1}^K \alpha_{ik} x_i + \sum_{i = 1}^n \alpha_{ij} x_i \end{align*}\] \[\begin{align*} w_j &= \sum_{y_i = j} \sum_{k=1}^K \alpha_{ik} x_i - \sum_{i=1}^n \alpha_{ij} x_i = \sum_{y_i = j} x_i \left( \sum_{k=1}^K \alpha_{ik}\right) - \sum_{i=1}^n \alpha_{ij} x_i \\ &= C \sum_{y_i = j} x_i - \sum_{i=1}^n \alpha_{ij} x_i = C \sum_{i=1}^n \delta_{y_i,j} x_i - \sum_{i=1}^n \alpha_{ij} x_i \\ w_j &= \sum_{i=1}^n (C \delta_{y_i,j} - \alpha_{ij}) x_i \end{align*}\]Substituting these values back into the Lagrangian, we get
\[\begin{align*} \mathcal{L} &= \frac{1}{2} \sum_{k=1}^K \sum_{i, j = 1}^n (C \delta_{y_i,k} - \alpha_{ik}) (C \delta_{y_j,k} - \alpha_{jk}) x_i^T x_j + \sum_{i=1}^n \xi_i\left(C - \sum_{k=1}^K \alpha_{ik}\right) \\ &+ \sum_{i=1}^n \sum_{k=1}^K \alpha_{ik} + \sum_{i=1}^n \sum_{k=1}^K \alpha_{ik} (w_k - w_{y_i})^T x_i - \sum_{i=1}^n \sum_{k=1}^K \alpha_{ik} \delta_{y_i,k} \\ &= \frac{1}{2} \sum_{k=1}^K \sum_{i, j = 1}^n (C \delta_{y_i,k} - \alpha_{ik}) (C \delta_{y_j,k} - \alpha_{jk}) x_i^T x_j + nC + \sum_{i=1}^n \sum_{k=1}^K \alpha_{ik} (w_k - w_{y_i})^T x_i \\ &- \sum_{i=1}^n \sum_{k=1}^K \alpha_{ik} \delta_{y_i,k} \end{align*}\]Note that
\[\begin{align*} \sum_{i=1}^n \sum_{k=1}^K \alpha_{ik} w_k^T x_i &= \sum_{i=1}^n \sum_{k=1}^K \alpha_{ik} \left( \sum_{j=1}^n (C \delta_{y_j,k} - \alpha_{jk}) x_j \right)^T x_i \\ &= \sum_{i, j = 1}^n x_j^T x_i \left( \sum_{k=1}^K \alpha_{ik} (C \delta_{y_j,k} - \alpha_{jk}) \right) \\ \sum_{i=1}^n \sum_{k=1}^K \alpha_{ik} w_{y_i}^T x_i &= \sum_{i=1}^n \sum_{k=1}^K \alpha_{ik} \left( \sum_{j=1}^n (C \delta_{y_j,y_i} - \alpha_{jy_i}) x_j \right)^T x_i \\ &= \sum_{i, j = 1}^n x_j^T x_i \left( \sum_{k=1}^K \alpha_{ik} (C \delta_{y_j,y_i} - \alpha_{jy_i}) \right) \\ &= \sum_{i, j=1}^n x_i^T x_i (C (C \delta_{y_i,y_i} - \alpha_{jy_i})) \\ &= \sum_{i, j=1}^n x_i^T x_i \left( \sum_{k=1}^K C\delta_{y_i,k}(C\delta_{y_i,k} - \alpha_{jk}) \right) \end{align*}\]and therefore,
\[\begin{align*} \mathcal{L} &= \frac{1}{2} \sum_{k=1}^K \sum_{i, j = 1}^n (C \delta_{y_i,k} - \alpha_{ik}) (C \delta_{y_j,k} - \alpha_{jk}) x_i^T x_j + nC \\ &- \sum_{i, j=1}^n x_i^T x_j \left( \sum_{k=1}^K (C \delta_{y_i,k} - \alpha_{ik}) (C \delta_{y_j,k} - \alpha_{jk}) \right) - \sum_{i=1}^n \sum_{k=1}^K \alpha_{ik} \delta_{y_i,k} \\ &= -\frac{1}{2} \sum_{k=1}^K \sum_{i, j = 1}^n (C \delta_{y_i,k} - \alpha_{ik}) (C \delta_{y_j,k} - \alpha_{jk}) x_i^T x_j + nC - \sum_{i=1}^n \sum_{k=1}^K \alpha_{ik} \delta_{y_i,k} \end{align*}\]Let $e_i = (\delta_{i,1}, \ldots, \delta_{i,K})$ for any $i \in [K]$, and let $e = (1, 1, \ldots, 1) \in \mathbb{R}^K$. Then the above equation can be written as
\[\begin{align*} \mathcal{L} &= -\frac{1}{2} \sum_{i, j = 1}^n (Ce_{y_i} - \alpha_i)^T (Ce_{y_j} - \alpha_j) x_i^T x_j + nC - \sum_{i=1}^n \alpha_i^T e_{y_i} \end{align*}\]Therefore, the dual problem is
\[\begin{align*} \max_{\alpha} &\quad -\frac{1}{2} \sum_{i, j = 1}^n (Ce_{y_i} - \alpha_i)^T (Ce_{y_j} - \alpha_j) x_i^T x_j - \sum_{i=1}^n \alpha_i^T e_{y_i} + nC\\ \text{or} \quad \min_{\alpha} &\quad \frac{1}{2} \sum_{i, j = 1}^n (Ce_{y_i} - \alpha_i)^T (Ce_{y_j} - \alpha_j) x_i^T x_j + \sum_{i=1}^n \alpha_i^T e_{y_i} - nC \\ \text{such that} &\quad \alpha_{ij} \ge 0 \quad \forall i \in [n], j \in [K], \quad \alpha_i^T e = C \quad \forall i \in [n] \end{align*}\]Let $\lambda_i = Ce_{y_i} - \alpha_i$ for all $i \in [n]$. Note that
\[\begin{align*} \sum_{i=1}^n \alpha_i^T e_{y_i} &= \sum_{i=1}^n (Ce_{y_i} - \lambda_i)^T e_{y_i} = \sum_{i=1}^n Ce_{y_i}^T e_{y_i} - \sum_{i=1}^n \lambda_i^T e_{y_i} \\ &= nC - \sum_{i=1}^n \lambda_i^T e_{y_i} \end{align*}\]Since $\alpha_i \ge 0$ for all $i \in [n]$, we have that $\lambda_i \le Ce_{y_i}$. Similarly, since $\alpha_i^T e = C$ for all $i \in [n]$, we have that $\lambda_i^T e = 0$. Substituting the values of $\lambda_i$, the dual problem can be written as
\[\begin{align*} \min_{\lambda} &\quad \frac{1}{2} \sum_{i, j = 1}^n (\lambda_i^T \lambda_j) (x_i^T x_j) - \sum_{i=1}^n \lambda_i^T e_{y_i} \\ \text{such that} &\quad \lambda_i \le Ce_{y_i}, \quad \lambda_i^T e = 0 \quad \forall i \in [n] \end{align*}\]Note that on solving the dual problem, we get the values of $w_j$ as
\[\begin{align*} w_j &= \sum_{i=1}^n \lambda_{ij} x_i \end{align*}\]for all $j \in [K]$.
This final expression is much simpler than not only the original primal problem, but also most formulations of the binary-class SVM.
CVXOPT provides a simple interface for quadratic programming. We will convert the dual problem into the standard form required by CVXOPT and feed it to the solver. We want a problem of the form
\[\begin{align*} \min_{x} &\quad \frac{1}{2} x^T P x + q^T x \\ \text{such that} &\quad Gx \preccurlyeq h, \quad Ax = b \end{align*}\]I am providing the $P, q, G, h, A, b$ matrices below and will leave the proof of correctness as an exercise for the reader (it is a simple exercise in linear algebra). Note that I have replaced the inner product $x_i^T x_j$ with the kernel $k(x_i, x_j)$.
\[\begin{align*} P &= \begin{bmatrix} k(x_1, x_1) I_{K} & \cdots & k(x_1, x_n) I_{K} \\ \vdots & \ddots & \vdots \\ k(x_n, x_1) I_{K} & \cdots & k(x_n, x_n) I_{K} \end{bmatrix} \\ q &= \begin{bmatrix} -e_{y_1} \\ \vdots \\ -e_{y_n} \end{bmatrix} \\ G &= -I_{nK} \\ h &= C \begin{bmatrix} e_{y_1} \\ \vdots \\ e_{y_n} \end{bmatrix} \\ A &= \begin{bmatrix} 1_{1 \times K} & 0_{1 \times K} & \cdots & 0_{1 \times K} \\ 0_{1 \times K} & 1_{1 \times K} & \cdots & 0_{1 \times K} \\ \vdots & \vdots & \ddots & \vdots \\ 0_{1 \times K} & 0_{1 \times K} & \cdots & 1_{1 \times K} \end{bmatrix} \\ b &= 0_{n \times 1} \end{align*}\]The implementation of our Mutliclass SVM is as follows.
import numpy as np
class MultiClassSVM:
"""
A Support Vector Machine for Multi-Class Classification
"""
def __init__(self, labels: np.ndarray, train_x: np.ndarray, train_y: np.ndarray):
"""
Constructor for the MultiClassSVM class that initializes the training data and labels
:param labels: the labels of the classes
:param train_x: the training data
:param train_y: the training labels
"""
self.labels = labels
self.train_x = train_x
self.train_y = train_y
def train(
self,
kernel: Callable = Kernels.linear(),
slack_weight: float = 0,
) -> None:
"""
Train the Multi-Class SVM
:param kernel: the kernel function to use if provided
:param slack_weight: the weight of the slack variables (0 for hard margin)
"""
n = self.train_x.shape[0]
k = len(self.labels)
e = np.zeros(n * k)
for i in range(n):
e[i * k + list(self.labels).index(self.train_y[i])] = 1
P = np.zeros((n * k, n * k))
for i in range(n):
for j in range(n):
P[i * k : (i + 1) * k, j * k : (j + 1) * k] = (
kernel(self.train_x[i], self.train_x[j]) + 1
) * np.eye(k)
q = -e
G = np.eye(n * k)
h = slack_weight * e
A = np.zeros((n, n * k))
for i in range(n):
A[i, i * k : (i + 1) * k] = 1
b = np.zeros(n)
sol = solvers.qp(
matrix(P),
matrix(q),
matrix(G),
matrix(h),
matrix(A, (n, n * k), "d"),
matrix(b),
)
# construct the weight vectors
W = [None for _ in range(k)]
for j in range(k):
W[j] = lambda x, j=j: sum(
sol["x"][i * k + j] * (kernel(x, self.train_x[i]) + 1)
for i in range(n)
)
# construct the classifier
def classifier(x):
res = [W[j](x) for j in range(k)]
return self.labels[res.index(max(res))]
return Classifier(classifier)
The Kernel functions can be easily swapped in and out by passing them as arguments to the train
function. An example implementation of the Kernels
class is as follows.
class Kernels:
"""
A class for different kernel functions
"""
@staticmethod
def rbf(sigma: float) -> float:
"""
Radial Basis Function (RBF) kernel
:param sigma: the width of the kernel
:return: the kernel function
"""
return lambda x, y: np.exp(-np.linalg.norm(x - y) ** 2 / (2 * sigma**2))
@staticmethod
def linear() -> float:
"""
Linear kernel
:return: the kernel function
"""
return lambda x, y: x @ y
The Classifier
class simply provides a wrapper around the classifier function.
class Classifier:
"""
A class for a classifier generated by the SVM
"""
def __init__(self, classifier: Callable):
"""
Constructor for the Classifier class
:param classifier: the classifier function
"""
self.classifier = classifier
def predict(self, x: np.ndarray) -> float:
"""
Predict the class of a data point
:param x: the data point
:return: the predicted class
"""
return self.classifier(x)
def test(self, test_x: np.ndarray, test_y: np.ndarray) -> float:
"""
Test the classifier on a test set
:param test_x: the test data
:param test_y: the test labels
:return: the accuracy of the classifier
"""
correct = 0
for i in range(test_x.shape[0]):
if self.predict(test_x[i]) == test_y[i]:
correct += 1
return correct / test_x.shape[0]
I realised the mirror is stained
and so I better wipe it clean.
Those spots however seemed ingrained
and so I replaced it with one that was pristine.
But now the mirror appeared warped
as the reflection was mine but not really.
“My nose looks a bit too long,” I harped
and replaced it again, though now wearily.
And just like that, it so happened
that every mirror I brought was flawed.
Mirror after mirror, I was rather maddended
but who knew I was hitting the wrong chord.
The distortions are my own
from the stains to the contortions.
Afterall mirrors are just shiny stone
that occasionally teach me a little precaution.