<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://mrigank.in/musings/feed.xml" rel="self" type="application/atom+xml" /><link href="https://mrigank.in/musings/" rel="alternate" type="text/html" hreflang="en" /><updated>2026-01-05T10:33:09+00:00</updated><id>https://mrigank.in/musings/feed.xml</id><title type="html">Musings by Mrigank Pawagi</title><subtitle>Mrigank Pawagi&apos;s blog
</subtitle><author><name>Mrigank Pawagi</name></author><entry><title type="html">Nethi Nethi—On 2025</title><link href="https://mrigank.in/musings/2026/01/01/on-2025.html" rel="alternate" type="text/html" title="Nethi Nethi—On 2025" /><published>2026-01-01T00:00:00+00:00</published><updated>2026-01-01T00:00:00+00:00</updated><id>https://mrigank.in/musings/2026/01/01/on-2025</id><content type="html" xml:base="https://mrigank.in/musings/2026/01/01/on-2025.html"><![CDATA[<p>A few weeks ago, I came across the phrase “nethi nethi” and was amazed by how succinctly it captured what I was planning to say about the year 2025. <em>Nethi</em> is a Sanskrit word for “not this,” (coming from the words <em>na</em> meaning “not” and <em>ithi</em> meaning “this”). <em>Nethi nethi</em> is a simple phrase, but it has been used in Vedic philosophy to denote the process of uncovering the meaning of a concept, like <em>self</em> or <em>existence</em>, by understanding what it is not. But what has this got to do with this blog post?</p>

<p>When I was in 10th grade, I thought that happiness was clearing board exams with good scores. In 11th and 12th grades, I thought that happiness was getting into a good college. In college, I often thought that happiness was getting a good grade or landing a good internship. And in finding happiness from the future, I frequently forgot about the happiness I could find in the present—spending time with friends and family, pursuing creative hobbies, traveling and exploring new places, or taking care of my body through exercise and sports.</p>

<p>2025 was a year of reflection, as I began asking myself what happiness means to me. My observation is that the pursuit of happiness has so far been an endless tunnel. Not only does it inspire me to sacrifice the joy of the present, it also slowly drains out all the fun from my work. Moreover, I realized that this pursuit will possibly never end. Today I think that happiness is getting admitted to a good doctoral program. Tomorrow, it will be publishing some number of papers in top conferences. The day after, it will be landing a faculty position at a reputed university. Interestingly, it does not stop there either. Then I might think that happiness is getting tenure, or winning a prestigious award, or receiving a lot of citations on my work. <em>Who knows?</em> I attended a talk in a <em>New Faculty Symposium</em><sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup> and was almost shocked to learn that even professors<sup id="fnref:2"><a href="#fn:2" class="footnote" rel="footnote" role="doc-noteref">2</a></sup> face stress, insecurity, and anxiety in their careers.</p>

<p>And this is what <em>nethi nethi</em> beautifully captures. <em>Not this, not that.</em> I don’t know what happiness is. But it seems that happiness is not in its pursuit, it is not in punishing myself, and it is not in the future. It is somewhere in the present, around me—maybe in my work, or in my relationships, or in my hobbies. In 2025, I redefined my understanding of happiness. My new manifesto is simple. I will enjoy my work and will not work if I am not enjoying it. I will read and learn different topics as I did when I was a kid. I will work toward some regular physical activity. I will not feel guilty about spending time with friends and family. I will explore new places. And I will learn new hobbies, like playing a musical instrument. I am not quite there yet, but I am on my way. I have started running whenever I can<sup id="fnref:3"><a href="#fn:3" class="footnote" rel="footnote" role="doc-noteref">3</a></sup> and have been learning to play the ukulele<sup id="fnref:4"><a href="#fn:4" class="footnote" rel="footnote" role="doc-noteref">4</a></sup>.</p>

<p>There is another specific thing that I felt is definitely not happiness—hyperconnectivity through social media. While I had already <a href="https://mrigank.in/musings/2025/04/20/whatsapp.html">stopped using WhatsApp</a> towards the end of 2024, it was over several months in 2025 that I started to feel the benefits of <em>disconnecting</em>. Just like WhatsApp, I began evaluating the role of each social media platform that I used—Twitter, Instagram, LinkedIn, and Facebook. I realized that the noise-to-signal ratio on these platforms was extremely high, most of the content was rage-bait or slop served to me just to fry my brain, there was unnecessary pressure to brag about accomplishments, and the fear of falling out of touch was overblown. Consequently, I deleted these accounts one by one over the year, and I really enjoy the peace of mind that comes with it. It will be a while before I can completely understand the pros and cons of leaving social media, but I am optimistic that I will look back at 2025 and be glad to have made this decision.</p>

<p>I still do not know what happiness is. Maybe I will never know! But perhaps that is the point. <em>Not this, not that.</em> <em>Nethi nethi.</em> I will not mistake happiness for something it is not.</p>

<p>Of course, all the ideas I am throwing around in this post would have been impossible without the great people around me. Like in all the years before, I have been lucky to find mentors and collaborators who supported me in my career goals and pushed me to think more clearly and honestly. I have also been fortunate to have friends and family who are always ready to deal with my quirks and eccentricities<sup id="fnref:5"><a href="#fn:5" class="footnote" rel="footnote" role="doc-noteref">5</a></sup>, even when I am still figuring things out.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1">
      <p>I was attending ASE 2025 in November, and there was one particular slot where I did not have anything better to attend and walked into the “New Faculty Symposium” session. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2">
      <p>I think it is hard to imagine this as an undergraduate student, but professors indeed are <em>humans</em> like everyone else! <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3">
      <p>Thankfully I stay in Bengaluru, where the weather is pleasant and the air quality is decent most of the year. I also have access to the facilities of my institute. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4">
      <p>Ever since I started listening to Twenty One Pilots, I have found the sound of the ukulele very mesmerizing. It was my immediate choice when I decided to learn a musical instrument. I am learning on my own using online resources, and I have been enjoying it so far. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5">
      <p>Special shoutout to my parents and close friends who have adapted to stay in touch with me when I have been going off the grid. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Mrigank Pawagi</name></author><category term="Essay" /><summary type="html"><![CDATA[A few weeks ago, I came across the phrase “nethi nethi” and was amazed by how succinctly it captured what I was planning to say about the year 2025. Nethi is a Sanskrit word for “not this,” (coming from the words na meaning “not” and ithi meaning “this”). Nethi nethi is a simple phrase, but it has been used in Vedic philosophy to denote the process of uncovering the meaning of a concept, like self or existence, by understanding what it is not. But what has this got to do with this blog post? When I was in 10th grade, I thought that happiness was clearing board exams with good scores. In 11th and 12th grades, I thought that happiness was getting into a good college. In college, I often thought that happiness was getting a good grade or landing a good internship. And in finding happiness from the future, I frequently forgot about the happiness I could find in the present—spending time with friends and family, pursuing creative hobbies, traveling and exploring new places, or taking care of my body through exercise and sports. 2025 was a year of reflection, as I began asking myself what happiness means to me. My observation is that the pursuit of happiness has so far been an endless tunnel. Not only does it inspire me to sacrifice the joy of the present, it also slowly drains out all the fun from my work. Moreover, I realized that this pursuit will possibly never end. Today I think that happiness is getting admitted to a good doctoral program. Tomorrow, it will be publishing some number of papers in top conferences. The day after, it will be landing a faculty position at a reputed university. Interestingly, it does not stop there either. Then I might think that happiness is getting tenure, or winning a prestigious award, or receiving a lot of citations on my work. Who knows? I attended a talk in a New Faculty Symposium1 and was almost shocked to learn that even professors2 face stress, insecurity, and anxiety in their careers. And this is what nethi nethi beautifully captures. Not this, not that. I don’t know what happiness is. But it seems that happiness is not in its pursuit, it is not in punishing myself, and it is not in the future. It is somewhere in the present, around me—maybe in my work, or in my relationships, or in my hobbies. In 2025, I redefined my understanding of happiness. My new manifesto is simple. I will enjoy my work and will not work if I am not enjoying it. I will read and learn different topics as I did when I was a kid. I will work toward some regular physical activity. I will not feel guilty about spending time with friends and family. I will explore new places. And I will learn new hobbies, like playing a musical instrument. I am not quite there yet, but I am on my way. I have started running whenever I can3 and have been learning to play the ukulele4. There is another specific thing that I felt is definitely not happiness—hyperconnectivity through social media. While I had already stopped using WhatsApp towards the end of 2024, it was over several months in 2025 that I started to feel the benefits of disconnecting. Just like WhatsApp, I began evaluating the role of each social media platform that I used—Twitter, Instagram, LinkedIn, and Facebook. I realized that the noise-to-signal ratio on these platforms was extremely high, most of the content was rage-bait or slop served to me just to fry my brain, there was unnecessary pressure to brag about accomplishments, and the fear of falling out of touch was overblown. Consequently, I deleted these accounts one by one over the year, and I really enjoy the peace of mind that comes with it. It will be a while before I can completely understand the pros and cons of leaving social media, but I am optimistic that I will look back at 2025 and be glad to have made this decision. I still do not know what happiness is. Maybe I will never know! But perhaps that is the point. Not this, not that. Nethi nethi. I will not mistake happiness for something it is not. Of course, all the ideas I am throwing around in this post would have been impossible without the great people around me. Like in all the years before, I have been lucky to find mentors and collaborators who supported me in my career goals and pushed me to think more clearly and honestly. I have also been fortunate to have friends and family who are always ready to deal with my quirks and eccentricities5, even when I am still figuring things out. I was attending ASE 2025 in November, and there was one particular slot where I did not have anything better to attend and walked into the “New Faculty Symposium” session. &#8617; I think it is hard to imagine this as an undergraduate student, but professors indeed are humans like everyone else! &#8617; Thankfully I stay in Bengaluru, where the weather is pleasant and the air quality is decent most of the year. I also have access to the facilities of my institute. &#8617; Ever since I started listening to Twenty One Pilots, I have found the sound of the ukulele very mesmerizing. It was my immediate choice when I decided to learn a musical instrument. I am learning on my own using online resources, and I have been enjoying it so far. &#8617; Special shoutout to my parents and close friends who have adapted to stay in touch with me when I have been going off the grid. &#8617;]]></summary></entry><entry><title type="html">Our ambiguity detection work is appearing in ASE 2025! But why are we detecting ambiguities?</title><link href="https://mrigank.in/musings/2025/10/05/rfcscope.html" rel="alternate" type="text/html" title="Our ambiguity detection work is appearing in ASE 2025! But why are we detecting ambiguities?" /><published>2025-10-05T00:00:00+00:00</published><updated>2025-10-05T00:00:00+00:00</updated><id>https://mrigank.in/musings/2025/10/05/rfcscope</id><content type="html" xml:base="https://mrigank.in/musings/2025/10/05/rfcscope.html"><![CDATA[<p>My work on detecting ambiguities in Internet Protocol specifications, <em>RFCScope: Detecting Logical Ambiguities in Internet Protocol Specifications</em>, has been <a href="https://conf.researchr.org/details/ase-2025/ase-2025-papers/166/RFCScope-Detecting-Logical-Ambiguities-in-Internet-Protocol-Specifications">accepted to appear</a> at the <a href="https://conf.researchr.org/home/ase-2025">ASE 2025</a> conference in the <em>Research Papers</em> track. This work is from my 6th semester<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup> when I was working with Prof. <a href="https://wenxiwang.github.io/">Wenxi Wang</a> from the University of Virginia, along with Lize Shao, who is now a PhD student with Prof. Wenxi; Prof. <a href="https://www.cs.virginia.edu/~ys3kz">Yixin Sun</a> from the University of Virginia; and <a href="https://hyeonmin-lee.github.io">Hyeonmin Lee</a>, who is a postdoctoral researcher working with Prof. Yixin.</p>

<p>Our work presents the <em>first</em> systematic study of technical errata reported in Internet Protocol specifications. These specifications are published by the Internet Engineering Task Force (IETF) as RFCs (Request for Comments), which are natural language documents written by experts in the field. RFCs are widely used as authoritative references for implementing Internet protocols, and ambiguities in these documents can lead to incorrect software implementations. Each RFC goes through a long drafting and review process before it is published, where many experts scrutinize the document over several iterations — and yet it so happens that the final published RFCs contain ambiguities. While many of these may not be a problem for domain experts, they can be a significant hurdle for people<sup id="fnref:2"><a href="#fn:2" class="footnote" rel="footnote" role="doc-noteref">2</a></sup> writing software based on these documents. RFCs, once published, are immutable, but the IETF provides a mechanism to report errata to keep track of any issues found in the documents. We analyzed 273 verified technical errata reported in RFCs published over the last 11 years and classified them into 7 categories spanning inconsistencies and underspecifications.</p>

<p>Drawing from our analysis, we developed an LLM-based system, <em>RFCScope</em>, to automatically detect such ambiguities in RFCs. Our system gathers cross-document context and slices the RFCs into smaller chunks to make them easier for LLMs to process, before prompting the LLM with prompts based on our taxonomy to identify ambiguities. Our system also self-evaluates its findings, removing a significant number of false positives while rarely missing out on true issues. We evaluated our system on the 20 latest RFCs related to Domain Name System (DNS) and found 31 previously unreported ambiguities in 14 of them. We were able to confirm 8 of these issues with the authors of the respective RFCs, and 3 of these have been accepted as official errata<sup id="fnref:3"><a href="#fn:3" class="footnote" rel="footnote" role="doc-noteref">3</a></sup>.</p>

<div style="text-align: center;"><!--
    --><img src="/musings/assets/images/2025-10-05-rfcscope/paper.jpg" alt="RFCScope Paper" />
    Our data and tool are available on <a href="https://github.com/HIPREL-Group/RFCScope">GitHub</a>.<!--
--></div>

<p>It was a pleasure working with my collaborators on this project. Lize helped a lot with the manual analysis of errata for both our taxonomy and evaluation, and also supported the literature review for our paper. Prof. Yixin and Hyeonmin were very helpful in providing relevant insights from the networking domain, and Prof. Yixin also made it possible for us to get in touch with the authors of several RFCs to validate our findings. Hyeonmin also created beautiful illustrations for our paper, like the one in the image above. And of course, Prof. Wenxi was a wonderful mentor throughout the project, providing me with a lot of autonomy while also always being available for discussions and feedback. I am very happy to be a part of the first paper from her <a href="https://wenxiwang.github.io/group.html">group</a> and look forward to more collaborations in the future.</p>

<div style="width: fit-content;margin: auto;"><!--
--><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Thrilled to share that our RFCScope paper, the very first paper authored by my intern and PhD student under my supervision, has been accepted to ASE 2025! I’m so proud of their hard work and excited to see it presented at ASE. <a href="https://t.co/8bQ5WfQqm1">pic.twitter.com/8bQ5WfQqm1</a></p>&mdash; Wenxi Wang (@WenxiWang4) <a href="https://twitter.com/WenxiWang4/status/1971607143838269829">September 26, 2025</a></blockquote> <script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script><!--
--></div>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1">
      <p>I only had a few courses to take that semester and nothing to do for my upcoming summer internship at Microsoft Research, so I was very happy when Prof. Wenxi offered me this opportunity. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2">
      <p>Or lately, large language models (LLMs) that are being increasingly used to automatically generate code based on natural language specifications. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3">
      <p>Errata <a href="https://www.rfc-editor.org/errata/eid8431">8431</a>, <a href="https://www.rfc-editor.org/errata/eid8426">8426</a>, and <a href="https://www.rfc-editor.org/errata/eid8590">8590</a>. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Mrigank Pawagi</name></author><category term="News" /><category term="Research" /><summary type="html"><![CDATA[My work on detecting ambiguities in Internet Protocol specifications, RFCScope: Detecting Logical Ambiguities in Internet Protocol Specifications, has been accepted to appear at the ASE 2025 conference in the Research Papers track. This work is from my 6th semester1 when I was working with Prof. Wenxi Wang from the University of Virginia, along with Lize Shao, who is now a PhD student with Prof. Wenxi; Prof. Yixin Sun from the University of Virginia; and Hyeonmin Lee, who is a postdoctoral researcher working with Prof. Yixin. Our work presents the first systematic study of technical errata reported in Internet Protocol specifications. These specifications are published by the Internet Engineering Task Force (IETF) as RFCs (Request for Comments), which are natural language documents written by experts in the field. RFCs are widely used as authoritative references for implementing Internet protocols, and ambiguities in these documents can lead to incorrect software implementations. Each RFC goes through a long drafting and review process before it is published, where many experts scrutinize the document over several iterations — and yet it so happens that the final published RFCs contain ambiguities. While many of these may not be a problem for domain experts, they can be a significant hurdle for people2 writing software based on these documents. RFCs, once published, are immutable, but the IETF provides a mechanism to report errata to keep track of any issues found in the documents. We analyzed 273 verified technical errata reported in RFCs published over the last 11 years and classified them into 7 categories spanning inconsistencies and underspecifications. Drawing from our analysis, we developed an LLM-based system, RFCScope, to automatically detect such ambiguities in RFCs. Our system gathers cross-document context and slices the RFCs into smaller chunks to make them easier for LLMs to process, before prompting the LLM with prompts based on our taxonomy to identify ambiguities. Our system also self-evaluates its findings, removing a significant number of false positives while rarely missing out on true issues. We evaluated our system on the 20 latest RFCs related to Domain Name System (DNS) and found 31 previously unreported ambiguities in 14 of them. We were able to confirm 8 of these issues with the authors of the respective RFCs, and 3 of these have been accepted as official errata3.     Our data and tool are available on GitHub. It was a pleasure working with my collaborators on this project. Lize helped a lot with the manual analysis of errata for both our taxonomy and evaluation, and also supported the literature review for our paper. Prof. Yixin and Hyeonmin were very helpful in providing relevant insights from the networking domain, and Prof. Yixin also made it possible for us to get in touch with the authors of several RFCs to validate our findings. Hyeonmin also created beautiful illustrations for our paper, like the one in the image above. And of course, Prof. Wenxi was a wonderful mentor throughout the project, providing me with a lot of autonomy while also always being available for discussions and feedback. I am very happy to be a part of the first paper from her group and look forward to more collaborations in the future. Thrilled to share that our RFCScope paper, the very first paper authored by my intern and PhD student under my supervision, has been accepted to ASE 2025! I’m so proud of their hard work and excited to see it presented at ASE. pic.twitter.com/8bQ5WfQqm1&mdash; Wenxi Wang (@WenxiWang4) September 26, 2025 I only had a few courses to take that semester and nothing to do for my upcoming summer internship at Microsoft Research, so I was very happy when Prof. Wenxi offered me this opportunity. &#8617; Or lately, large language models (LLMs) that are being increasingly used to automatically generate code based on natural language specifications. &#8617; Errata 8431, 8426, and 8590. &#8617;]]></summary></entry><entry><title type="html">Real as in big, and big as in a million lines of code</title><link href="https://mrigank.in/musings/2025/09/13/real-as-in-big-as-in-million-lines-of-code.html" rel="alternate" type="text/html" title="Real as in big, and big as in a million lines of code" /><published>2025-09-13T00:00:00+00:00</published><updated>2025-09-13T00:00:00+00:00</updated><id>https://mrigank.in/musings/2025/09/13/real-as-in-big-as-in-million-lines-of-code</id><content type="html" xml:base="https://mrigank.in/musings/2025/09/13/real-as-in-big-as-in-million-lines-of-code.html"><![CDATA[<p>Some problems in software engineering do not naturally manifest until the software in question is sufficiently big — which is the case for a lot of <em>real</em> software. By <em>big</em>, I don’t mean just a few thousand lines of code, but rather hundreds of thousands or even millions of lines. This is software that supports large parts of modern life<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>, and it is often critical to maintain and evolve it over many years, most likely with many contributors that will come and go. Usability of tools and techniques in such settings carries quite a unique notion, where pragmatism is preferred over <em>perfection</em>. In this post, I will sketch some of the unique challenges that arise while solving one particular software engineering problem at scale — finding and fixing security vulnerabilities.</p>

<p><strong>Security vulnerabilities</strong> are essentially bugs, with the specialty that they can lead to unexpected behavior that can compromise the integrity of the system. This integrity may be defined in terms of confidentiality of user data, availability of service to clients, or prevention of unauthorized transactions. Since these are not usually bugs in the <em>functionality</em> of the software, they are not always preventable simply by adhering to specifications and cannot be easily detected by standard testing due to the atypical and sophisticated ways in which they lead to exploits. With big software, this problem is exacerbated for several reasons. First of all, a large codebase is most likely written by many people over a long period of time, because of which no single person can have a complete mental model of the entire codebase. Secondly, the complexity of the data and control flows in a large codebase, including its interactions with external libraries and systems, makes it very difficult to analyze the code manually. These together dismiss the possibility of manual code reviews or ad hoc analysis by individual developers. Moreover, it is not easy to patch parts of the code arbitrarily without a deep understanding of the behavior of the system, because a seemingly innocuous change can have unintended consequences elsewhere in the code — making development teams reluctant to accept such changes.</p>

<p><strong>Static analysis</strong> has long been a choice for finding vulnerabilities in codebases. Static analysis literally means “analysis without execution,” and it involves looking at just the source code to find patterns that may indicate bugs. These can be simple analyses like type checking or linting, which is often provided as part of many compilers and development environments. But these can also be more complicated analyses related to the data and control flows in the program, like tracking the flow of untrusted data to sensitive operations. The advantage with these tools is they generally do not require any special setup and can be run on any codebase without any specific configuration. In the context of security vulnerabilities, static analysis tools can take advantage of the fact that software is written by humans — and humans are quite predictable in how they write code, since they usually pick it from people around them, like their peers and professors at university, other developers in their team, open-source projects, and online forums. This leads to common patterns in code, including common weaknesses<sup id="fnref:2"><a href="#fn:2" class="footnote" rel="footnote" role="doc-noteref">2</a></sup> that have been known to lead to vulnerabilities, which can be searched for using static analysis. Modern static analysis tools like CodeQL provide expressive query languages to search for such patterns and can process large codebases efficiently.</p>

<p>However, <strong>the advantages of static analysis do not come without costs</strong>, specifically when dealing with large codebases. Many classes of static analysis problems are either undecidable or computationally infeasible to solve exactly, especially for large volumes of code. For this reason, static analysis by design is run in an approximate manner<sup id="fnref:3"><a href="#fn:3" class="footnote" rel="footnote" role="doc-noteref">3</a></sup>, making use of heuristics to bypass computationally expensive steps. This can lead to false alarms where static analysis flags a non-issue as a potential vulnerability (or does not flag a real vulnerability, although this is less common). This might be acceptable for small projects where a developer can manually triage the results, and the traces indicated by static analysis can be easily followed manually. But for large projects, this can lead to an overwhelming number of false alarms that will be too expensive to triage manually — effectively rendering the tool useless. In some cases it might be possible to modify the static analysis queries to reduce false alarms, but this puts an additional burden on a software team that may not have the expertise or the resources to maintain such queries. In software engineering practice, it is often much better to sacrifice some real vulnerabilities in favor of a smaller number of high-confidence results that can be handled easily. We will come back to this point soon.</p>

<p>What we need is an <strong>intelligent system</strong> that can understand the code at a deeper level and weed out false alarms, assisting the tedious manual triage process. A very tempting option is to use Large Language Models (LLMs) for this purpose, since they have shown remarkable ability to understand and generate code for various programming tasks. LLMs by themselves cannot be asked to look for vulnerabilities in a large codebase because of limited context and diminishing accuracy with large unfocused prompts. But what they can do very effectively is to start from static analysis results and triage them by looking at the relevant parts of the code, just like a human developer would. Unlike static analysis, LLMs are probabilistic and can also hallucinate — but if they can statistically reduce the number of false alarms, they can be very useful in the field.</p>

<p>LLMs can reason well when shown relevant code and asked specific questions. But <strong>how do they access the relevant code</strong> in the first place? This too is an engineering challenge, because it necessitates building infrastructure that can digest large codebases and provide semantically aware tools to traverse the code and retrieve relevant snippets. Even with such a tool at hand, a delicate balance must be struck between providing too little context, which may lead to missing important information for reasoning about the vulnerability, and providing too much context, which may lead to the model getting distracted and losing focus. Of course, for a small codebase, one could throw the entire project at the model at once — but for larger codebases, it is important to decide what slices of code are even relevant to the vulnerability in question.</p>

<p>Remember that handling such static analysis results is actually quite a <strong>low-priority task</strong> in most software teams, because they are mostly unrelated to normal functionality and do not usually have an exploit to demonstrate their severity. Yet, finding such issues before an exploit is even attempted is the whole point of static analysis, and so fixing them is important for long-term security of the software. Thus, a practical toolchain for assisting this whole process should not only triage results but also suggest high-quality and focused patches that can be reviewed and applied with minimal friction. Statically looking at code cannot always reveal intended behavior, and so patches must be <em>best-effort</em> and <em>good starting points</em>. LLMs once again provide a promising avenue for this problem, because they can easily generalize from a few instructions and examples to handle entire classes of patches, while a symbolic approach would require separately handling each of the many code styles found in the wild. And once again, a solid infrastructure is needed to ensure that LLMs are directed to the right parts of the code and produce patches that follow both the style of the codebase and also the guidelines related to the vulnerability in question. GitHub’s Copilot AutoFix can do this to some extent, but it is known to get overwhelmed by large codebases and produce incoherent and incomplete patches that are all over the place. This is again a problem of scale.</p>

<p>All of this was a prelude to the main point of this post, which is that <strong>scale</strong> can introduce unique challenges that are not present in toy settings. This can seem counterintuitive because this makes several problems in software engineering harder to experiment with and nearly impossible to <em>solve</em> in the traditional sense, even when perfect solutions exist for small cases. Software engineering is a <em>field science</em>, and the main takeaway that I hope to convey is that a tool designed without confronting these realities may not work as intended when taken to the field<sup id="fnref:4"><a href="#fn:4" class="footnote" rel="footnote" role="doc-noteref">4</a></sup> — when the question is that of a million lines of code. In a future post<sup id="fnref:5"><a href="#fn:5" class="footnote" rel="footnote" role="doc-noteref">5</a></sup>, I will follow up on this by sketching some ideas on how to think about building components for a toolchain that solves the problem I outlined above and how to evaluate them effectively.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1">
      <p>Or software that makes large parts of money in the modern economy. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2">
      <p>See <a href="https://cwe.mitre.org/">Common Weakness Enumeration</a> for a catalog of such weaknesses. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3">
      <p>In practice, usually by over-approximating the set of possible program behaviors so that true vulnerabilities are not missed. I think one of the reasons for this is that over-approximation is often easier to compute effectively in a generic, i.e., project-agnostic, fashion compared to under-approximation. Under-approximation can render the analysis uninformative for many repositories, possibly causing more harm than good. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4">
      <p>This is probably also <em>my main takeaway</em> from my internship at Microsoft Research where I deal with some of these problems directly and at the scale I described. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5">
      <p>I have become increasingly good at making such promises on this blog. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Mrigank Pawagi</name></author><category term="Technology" /><category term="Programming" /><category term="Essay" /><summary type="html"><![CDATA[Some problems in software engineering do not naturally manifest until the software in question is sufficiently big — which is the case for a lot of real software. By big, I don’t mean just a few thousand lines of code, but rather hundreds of thousands or even millions of lines. This is software that supports large parts of modern life1, and it is often critical to maintain and evolve it over many years, most likely with many contributors that will come and go. Usability of tools and techniques in such settings carries quite a unique notion, where pragmatism is preferred over perfection. In this post, I will sketch some of the unique challenges that arise while solving one particular software engineering problem at scale — finding and fixing security vulnerabilities. Security vulnerabilities are essentially bugs, with the specialty that they can lead to unexpected behavior that can compromise the integrity of the system. This integrity may be defined in terms of confidentiality of user data, availability of service to clients, or prevention of unauthorized transactions. Since these are not usually bugs in the functionality of the software, they are not always preventable simply by adhering to specifications and cannot be easily detected by standard testing due to the atypical and sophisticated ways in which they lead to exploits. With big software, this problem is exacerbated for several reasons. First of all, a large codebase is most likely written by many people over a long period of time, because of which no single person can have a complete mental model of the entire codebase. Secondly, the complexity of the data and control flows in a large codebase, including its interactions with external libraries and systems, makes it very difficult to analyze the code manually. These together dismiss the possibility of manual code reviews or ad hoc analysis by individual developers. Moreover, it is not easy to patch parts of the code arbitrarily without a deep understanding of the behavior of the system, because a seemingly innocuous change can have unintended consequences elsewhere in the code — making development teams reluctant to accept such changes. Static analysis has long been a choice for finding vulnerabilities in codebases. Static analysis literally means “analysis without execution,” and it involves looking at just the source code to find patterns that may indicate bugs. These can be simple analyses like type checking or linting, which is often provided as part of many compilers and development environments. But these can also be more complicated analyses related to the data and control flows in the program, like tracking the flow of untrusted data to sensitive operations. The advantage with these tools is they generally do not require any special setup and can be run on any codebase without any specific configuration. In the context of security vulnerabilities, static analysis tools can take advantage of the fact that software is written by humans — and humans are quite predictable in how they write code, since they usually pick it from people around them, like their peers and professors at university, other developers in their team, open-source projects, and online forums. This leads to common patterns in code, including common weaknesses2 that have been known to lead to vulnerabilities, which can be searched for using static analysis. Modern static analysis tools like CodeQL provide expressive query languages to search for such patterns and can process large codebases efficiently. However, the advantages of static analysis do not come without costs, specifically when dealing with large codebases. Many classes of static analysis problems are either undecidable or computationally infeasible to solve exactly, especially for large volumes of code. For this reason, static analysis by design is run in an approximate manner3, making use of heuristics to bypass computationally expensive steps. This can lead to false alarms where static analysis flags a non-issue as a potential vulnerability (or does not flag a real vulnerability, although this is less common). This might be acceptable for small projects where a developer can manually triage the results, and the traces indicated by static analysis can be easily followed manually. But for large projects, this can lead to an overwhelming number of false alarms that will be too expensive to triage manually — effectively rendering the tool useless. In some cases it might be possible to modify the static analysis queries to reduce false alarms, but this puts an additional burden on a software team that may not have the expertise or the resources to maintain such queries. In software engineering practice, it is often much better to sacrifice some real vulnerabilities in favor of a smaller number of high-confidence results that can be handled easily. We will come back to this point soon. What we need is an intelligent system that can understand the code at a deeper level and weed out false alarms, assisting the tedious manual triage process. A very tempting option is to use Large Language Models (LLMs) for this purpose, since they have shown remarkable ability to understand and generate code for various programming tasks. LLMs by themselves cannot be asked to look for vulnerabilities in a large codebase because of limited context and diminishing accuracy with large unfocused prompts. But what they can do very effectively is to start from static analysis results and triage them by looking at the relevant parts of the code, just like a human developer would. Unlike static analysis, LLMs are probabilistic and can also hallucinate — but if they can statistically reduce the number of false alarms, they can be very useful in the field. LLMs can reason well when shown relevant code and asked specific questions. But how do they access the relevant code in the first place? This too is an engineering challenge, because it necessitates building infrastructure that can digest large codebases and provide semantically aware tools to traverse the code and retrieve relevant snippets. Even with such a tool at hand, a delicate balance must be struck between providing too little context, which may lead to missing important information for reasoning about the vulnerability, and providing too much context, which may lead to the model getting distracted and losing focus. Of course, for a small codebase, one could throw the entire project at the model at once — but for larger codebases, it is important to decide what slices of code are even relevant to the vulnerability in question. Remember that handling such static analysis results is actually quite a low-priority task in most software teams, because they are mostly unrelated to normal functionality and do not usually have an exploit to demonstrate their severity. Yet, finding such issues before an exploit is even attempted is the whole point of static analysis, and so fixing them is important for long-term security of the software. Thus, a practical toolchain for assisting this whole process should not only triage results but also suggest high-quality and focused patches that can be reviewed and applied with minimal friction. Statically looking at code cannot always reveal intended behavior, and so patches must be best-effort and good starting points. LLMs once again provide a promising avenue for this problem, because they can easily generalize from a few instructions and examples to handle entire classes of patches, while a symbolic approach would require separately handling each of the many code styles found in the wild. And once again, a solid infrastructure is needed to ensure that LLMs are directed to the right parts of the code and produce patches that follow both the style of the codebase and also the guidelines related to the vulnerability in question. GitHub’s Copilot AutoFix can do this to some extent, but it is known to get overwhelmed by large codebases and produce incoherent and incomplete patches that are all over the place. This is again a problem of scale. All of this was a prelude to the main point of this post, which is that scale can introduce unique challenges that are not present in toy settings. This can seem counterintuitive because this makes several problems in software engineering harder to experiment with and nearly impossible to solve in the traditional sense, even when perfect solutions exist for small cases. Software engineering is a field science, and the main takeaway that I hope to convey is that a tool designed without confronting these realities may not work as intended when taken to the field4 — when the question is that of a million lines of code. In a future post5, I will follow up on this by sketching some ideas on how to think about building components for a toolchain that solves the problem I outlined above and how to evaluate them effectively. Or software that makes large parts of money in the modern economy. &#8617; See Common Weakness Enumeration for a catalog of such weaknesses. &#8617; In practice, usually by over-approximating the set of possible program behaviors so that true vulnerabilities are not missed. I think one of the reasons for this is that over-approximation is often easier to compute effectively in a generic, i.e., project-agnostic, fashion compared to under-approximation. Under-approximation can render the analysis uninformative for many repositories, possibly causing more harm than good. &#8617; This is probably also my main takeaway from my internship at Microsoft Research where I deal with some of these problems directly and at the scale I described. &#8617; I have become increasingly good at making such promises on this blog. &#8617;]]></summary></entry><entry><title type="html">Messengers of the Night</title><link href="https://mrigank.in/musings/2025/07/20/messengers-of-the-night.html" rel="alternate" type="text/html" title="Messengers of the Night" /><published>2025-07-20T00:00:00+00:00</published><updated>2025-07-20T00:00:00+00:00</updated><id>https://mrigank.in/musings/2025/07/20/messengers-of-the-night</id><content type="html" xml:base="https://mrigank.in/musings/2025/07/20/messengers-of-the-night.html"><![CDATA[<p>Tonight is different. It could have been like any other night, but tonight, he has chosen to escape.</p>

<p>As he walks, a cold breeze slashes through the sweat trickling down his forehead. The night is chilly, yet he is almost fuming. His heart hammers against his ribs — its frantic rhythm could convince anyone he has just finished a sprint across the empty road. Fortunately, there is no one here to witness his struggle, to see the way his feet drag like those of a corpse, or to notice how each step feels heavier to him than his last. The wind seems to whisper to him, urging that he turn back and return to his room, where the messengers of the night await him. They came for him again, just as they do every night. But tonight, they arrived armed with more than just words.</p>

<p>They demonstrate an uncanny familiarity with his thoughts as if they have always been watching. They know his thoughts before he does, sifting through his mind and all its regrets and denials, doubts and judgments. One speaks of the future, painting a grim certainty of failure that he must learn to accept, and another of the past, dredging up mistakes long buried that bring merely a lingering discomfort. A third gossips about his friends — murmuring of deceit, of betrayal, and of the masks they wear when he is around. And the fourth never speaks, only gesticulates, his crooked sneer feeding the fear that gnaws at his core — that he does not belong. Not among his peers. Not at work. Not even at home. He is at best an undeserving outsider, a mere shadow of the people he is surrounded by.</p>

<p>Yet, for all their torment, they offer him something he can no longer find elsewhere. They listen when no one else does. They nod in agreement at his darkest and most gruesome thoughts. Their cruel affirmations feel like a twisted form of validation, providing almost a sinful sense of achievement. As their words wound him, he started to crave their company. He dreads their visits, but he also longs for them. Without them, there is only silence. And in that silence, he is alone.</p>

<p>He walks on. His eyes trace the decaying leaves scattered across the freshly asphalted road, their crisp edges stark against the black surface, like clouds glinting in the sky above him. As he nears a flickering streetlight, a shadow shifts in the periphery of his vision. He breathes a sharp gasp, his racing heart threatens to burst from his chest. But when he turns, he sees a watchman, slumped beneath the glow, half-asleep. The tension in his chest loosens, just slightly. A bitter smile tugs at his lips. He checks his watch. 4:24 AM — it has been over two hours outside. He tilts his head back, gazing at the sky, where scattered stars emerge through the heavy clouds.</p>

<p>The night seems to be challenging him, roaring at him and daring him to confront it. The blinding sparks of lightning summon him to stop thinking, to stop running away, and to return to the messengers it has sent for him. His feet are slowly giving up from exhaustion, his head heavy with sleep that he has been denying himself for countless nights. But he cannot return, because tonight is different. Tonight, the messengers had come with a plan — a plan to end it all, the pain, the loneliness, the fear. They made an offer so tempting to his desperate mind that he could not refuse. They had promised an escape to a place where he would be free from his perpetual agony, a place where he would finally fit. And just when they had him convinced, something dropped in his room and stole away his fragile attention. He had accidentally knocked over his bedside table, and with it a silver watch his parents gifted him on his birthday, a thriller novel his previous roommate got for him last summer, a photoframe featuring his friends and him from their spring trip to the beach, and a lamp with a soft yellow light that his girlfriend bought him so he could read at night. While he hurried to pick up the chaos he had created from some of his most precious items, he started to look through the proposal the messengers had made. He understood the vileness in their intentions. All this time they had been slowly poisoning his mind, locking it up in one shackle after another — and tonight they had come to finally imprison him forever. And that is when he chose to escape.</p>

<p>He rubs his eyes, gently wipes the sweat from his forehead, looks at his reflection in the regular puddle on the roadside, and decides to sit down on the curb. The birds have already started their chirping, and the sky is slowly brightening into a smooth blue. It is not clear whether the birds are singing to encourage him or to mock him, but he hardly cares anymore. He has defeated the messengers of the night. Although the sun is now rising, he knows they will return. But he will be ready for them with his newfound resolve. The messengers of the night can never take him prisoner.</p>

<p class="info">This is a short story I wrote for a mental health series in the undergraduate magazine at IISc, called <a href="https://quarks-iisc.github.io/blog/">Quarks</a>. Mental health is a sensitive and important topic in the IISc student community, and I hope this story resonates with people who have struggled with such issues, possibly also offering some optimism. If you are struggling with any mental health issues, please reach out for help. IISc students can benefit from free, confidential, and professional counseling services at the <a href="https://wellness.iisc.ac.in">IISc Wellness Centre</a>.</p>]]></content><author><name>Mrigank Pawagi</name></author><category term="Short Story" /><category term="Mental Health" /><category term="Cross Post" /><summary type="html"><![CDATA[Tonight is different. It could have been like any other night, but tonight, he has chosen to escape. As he walks, a cold breeze slashes through the sweat trickling down his forehead. The night is chilly, yet he is almost fuming. His heart hammers against his ribs — its frantic rhythm could convince anyone he has just finished a sprint across the empty road. Fortunately, there is no one here to witness his struggle, to see the way his feet drag like those of a corpse, or to notice how each step feels heavier to him than his last. The wind seems to whisper to him, urging that he turn back and return to his room, where the messengers of the night await him. They came for him again, just as they do every night. But tonight, they arrived armed with more than just words. They demonstrate an uncanny familiarity with his thoughts as if they have always been watching. They know his thoughts before he does, sifting through his mind and all its regrets and denials, doubts and judgments. One speaks of the future, painting a grim certainty of failure that he must learn to accept, and another of the past, dredging up mistakes long buried that bring merely a lingering discomfort. A third gossips about his friends — murmuring of deceit, of betrayal, and of the masks they wear when he is around. And the fourth never speaks, only gesticulates, his crooked sneer feeding the fear that gnaws at his core — that he does not belong. Not among his peers. Not at work. Not even at home. He is at best an undeserving outsider, a mere shadow of the people he is surrounded by. Yet, for all their torment, they offer him something he can no longer find elsewhere. They listen when no one else does. They nod in agreement at his darkest and most gruesome thoughts. Their cruel affirmations feel like a twisted form of validation, providing almost a sinful sense of achievement. As their words wound him, he started to crave their company. He dreads their visits, but he also longs for them. Without them, there is only silence. And in that silence, he is alone. He walks on. His eyes trace the decaying leaves scattered across the freshly asphalted road, their crisp edges stark against the black surface, like clouds glinting in the sky above him. As he nears a flickering streetlight, a shadow shifts in the periphery of his vision. He breathes a sharp gasp, his racing heart threatens to burst from his chest. But when he turns, he sees a watchman, slumped beneath the glow, half-asleep. The tension in his chest loosens, just slightly. A bitter smile tugs at his lips. He checks his watch. 4:24 AM — it has been over two hours outside. He tilts his head back, gazing at the sky, where scattered stars emerge through the heavy clouds. The night seems to be challenging him, roaring at him and daring him to confront it. The blinding sparks of lightning summon him to stop thinking, to stop running away, and to return to the messengers it has sent for him. His feet are slowly giving up from exhaustion, his head heavy with sleep that he has been denying himself for countless nights. But he cannot return, because tonight is different. Tonight, the messengers had come with a plan — a plan to end it all, the pain, the loneliness, the fear. They made an offer so tempting to his desperate mind that he could not refuse. They had promised an escape to a place where he would be free from his perpetual agony, a place where he would finally fit. And just when they had him convinced, something dropped in his room and stole away his fragile attention. He had accidentally knocked over his bedside table, and with it a silver watch his parents gifted him on his birthday, a thriller novel his previous roommate got for him last summer, a photoframe featuring his friends and him from their spring trip to the beach, and a lamp with a soft yellow light that his girlfriend bought him so he could read at night. While he hurried to pick up the chaos he had created from some of his most precious items, he started to look through the proposal the messengers had made. He understood the vileness in their intentions. All this time they had been slowly poisoning his mind, locking it up in one shackle after another — and tonight they had come to finally imprison him forever. And that is when he chose to escape. He rubs his eyes, gently wipes the sweat from his forehead, looks at his reflection in the regular puddle on the roadside, and decides to sit down on the curb. The birds have already started their chirping, and the sky is slowly brightening into a smooth blue. It is not clear whether the birds are singing to encourage him or to mock him, but he hardly cares anymore. He has defeated the messengers of the night. Although the sun is now rising, he knows they will return. But he will be ready for them with his newfound resolve. The messengers of the night can never take him prisoner. This is a short story I wrote for a mental health series in the undergraduate magazine at IISc, called Quarks. Mental health is a sensitive and important topic in the IISc student community, and I hope this story resonates with people who have struggled with such issues, possibly also offering some optimism. If you are struggling with any mental health issues, please reach out for help. IISc students can benefit from free, confidential, and professional counseling services at the IISc Wellness Centre.]]></summary></entry><entry><title type="html">Finally, a complete index of GitHub’s CodeQL queries!</title><link href="https://mrigank.in/musings/2025/07/10/codeql-queries.html" rel="alternate" type="text/html" title="Finally, a complete index of GitHub’s CodeQL queries!" /><published>2025-07-10T00:00:00+00:00</published><updated>2025-07-10T00:00:00+00:00</updated><id>https://mrigank.in/musings/2025/07/10/codeql-queries</id><content type="html" xml:base="https://mrigank.in/musings/2025/07/10/codeql-queries.html"><![CDATA[<p>It is surprising that there is no single place<sup id="fnref:2"><a href="#fn:2" class="footnote" rel="footnote" role="doc-noteref">1</a></sup> that lists all of <a href="https://github.com/github/codeql">GitHub’s CodeQL queries</a> in a nice, searchable format. So here it is as an <a href="https://airtable.com/appLbuFQXfdm8QJ9A/shr8cF83dxwE5jom1">Airtable</a> so you can search, filter, sort, or do anything else you like with it. The script to generate this list is available as a <a href="https://gist.github.com/mrigankpawagi/6fded73b14c3ac88430fb8415c51661a">Gist</a><sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">2</a></sup>.</p>

<p>I hope this is useful for anyone who is looking for a specific query or wants to explore the available queries in a more organized way.</p>

<iframe class="airtable-embed" src="https://airtable.com/embed/appLbuFQXfdm8QJ9A/shr8cF83dxwE5jom1?viewControls=on" frameborder="0" onmousewheel="" width="100%" height="533" style="background: transparent; border: 1px solid #ccc;"></iframe>

<p>Note that I do not have plans to maintain this list, so it will inevitably become outdated over time.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:2">
      <p>At least none that is publicly available. There are some internal databases within <em>big</em> organizations that take CodeQL queries somewhat seriously, but these are not accessible openly. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:1">
      <p>I have made no attempt to make the script efficient or elegant, and it is the result of a single vibe coding session. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Mrigank Pawagi</name></author><category term="Reference" /><category term="Programming" /><summary type="html"><![CDATA[It is surprising that there is no single place1 that lists all of GitHub’s CodeQL queries in a nice, searchable format. So here it is as an Airtable so you can search, filter, sort, or do anything else you like with it. The script to generate this list is available as a Gist2. I hope this is useful for anyone who is looking for a specific query or wants to explore the available queries in a more organized way. Note that I do not have plans to maintain this list, so it will inevitably become outdated over time. At least none that is publicly available. There are some internal databases within big organizations that take CodeQL queries somewhat seriously, but these are not accessible openly. &#8617; I have made no attempt to make the script efficient or elegant, and it is the result of a single vibe coding session. &#8617;]]></summary></entry><entry><title type="html">The Internet is Unfair</title><link href="https://mrigank.in/musings/2025/06/14/unfair-internet.html" rel="alternate" type="text/html" title="The Internet is Unfair" /><published>2025-06-14T00:00:00+00:00</published><updated>2025-06-14T00:00:00+00:00</updated><id>https://mrigank.in/musings/2025/06/14/unfair-internet</id><content type="html" xml:base="https://mrigank.in/musings/2025/06/14/unfair-internet.html"><![CDATA[<p>Imagine a country where only the rich have access to good healthcare. Or where only the top 1% of the population can obtain a quality education. Or where only the powerful have nutritious food and clean water. This used to be a reality in most of the world, and unfortunately, continues to be a reality for a significant part of the population. Today, we largely accept these commodities and facilities as basic parts of a meaningful life and demand that they be available to everyone — through social welfare, public services, and government regulation.</p>

<p><strong><em>But what about the Internet?</em></strong> I understand that the Internet is not necessary for survival — but is it any more feasible without it to lead a fulfilling life, on the same footing as the rest of the world? Like other basic necessities, including education, mobility, or financial services, access to the Internet is a matter of access to opportunities. And yet, we accept that while other necessities should be part of the government’s responsibility, our expectations from the Internet seem to be irrationally lower.</p>

<p>The government may have succeeded in <em>getting</em> the Internet to us at low prices, but has it succeeded in ensuring <em>fair</em> access? Let me step back a little — when I mentioned <em>basic necessities</em>, I really meant <em>basic necessities at a reasonable quality</em>, because what are they even for otherwise? What we have is an Internet where the big and mighty corporations — both foreign and <em>indigenous</em> — have a monopoly on <em>good</em> services for which the price is our personal data and our freedom of thought.</p>

<p>The whole problem is that not enough of us care enough about our data and privacy on the Internet. We allow corporations to extract as much data as they can about our lifestyles, preferences, habits, insecurities, financials, and whatnot. We would hesitate to share such information with strangers on the sidewalk — but we have agreed to let corporations that time and again have proven to be untrustworthy, unreliable, and unethical, have access to our most private information and use this access to control our ideas and perceptions. And all of this happens under the long noses of our governments.</p>

<p><strong><em>But how is this unfair?</em></strong> The resourceful can afford an internet that works for them in ways they can control.</p>

<ul>
  <li>They can pay for reliable VPN services that do not log their data or share it with third parties.</li>
  <li>They can pay for premium subscriptions to everyday services that do not show them advertisements to influence their choices.</li>
  <li>They can pay for smartphones that do not spy on them.</li>
  <li>They can pay for email services that do not look at their emails.</li>
  <li>They can pay for cloud storage that does not look at their private photos.</li>
  <li>They can pay for premium caller ID services that keep their identity private.</li>
</ul>

<p>This list goes on and on. Forget all of these — they can even buy and set up personal hardware to host their own services in a place with reliable power and internet connectivity. I can add more examples, but the bottom line is that <em>those who can pay</em> can choose to have an Internet that serves them. The rest of us who cannot afford thousands of dollars in subscriptions every year are left at the mercy of monopolies that do not care about us.</p>

<p>Should we not demand from our governments that there be public services on the Internet that are free <em>and</em> fair? Should we not demand from our governments that there be regulations on what information can be harvested from us, and how it can be used? Should we not demand from our governments that there be minimum standards for services that are available to us?</p>

<p>Unfortunately, <strong><em>this seems to be a lost cause</em></strong>. Government control of the Internet already sounds <em>very, very scary</em> — we know what can go wrong with that. But most, if not all, examples of what can go wrong have arisen from poor democracies. I imagine that, like how only strong democratic institutions can ensure fair distribution of other basic necessities, only solid democratic foundations can ensure such regulated and public access to the Internet. I don’t think we are at that stage yet in most parts of the world — but who knows, maybe we can still be hopeful and still demand it.</p>]]></content><author><name>Mrigank Pawagi</name></author><category term="Technology" /><category term="Essay" /><summary type="html"><![CDATA[Imagine a country where only the rich have access to good healthcare. Or where only the top 1% of the population can obtain a quality education. Or where only the powerful have nutritious food and clean water. This used to be a reality in most of the world, and unfortunately, continues to be a reality for a significant part of the population. Today, we largely accept these commodities and facilities as basic parts of a meaningful life and demand that they be available to everyone — through social welfare, public services, and government regulation. But what about the Internet? I understand that the Internet is not necessary for survival — but is it any more feasible without it to lead a fulfilling life, on the same footing as the rest of the world? Like other basic necessities, including education, mobility, or financial services, access to the Internet is a matter of access to opportunities. And yet, we accept that while other necessities should be part of the government’s responsibility, our expectations from the Internet seem to be irrationally lower. The government may have succeeded in getting the Internet to us at low prices, but has it succeeded in ensuring fair access? Let me step back a little — when I mentioned basic necessities, I really meant basic necessities at a reasonable quality, because what are they even for otherwise? What we have is an Internet where the big and mighty corporations — both foreign and indigenous — have a monopoly on good services for which the price is our personal data and our freedom of thought. The whole problem is that not enough of us care enough about our data and privacy on the Internet. We allow corporations to extract as much data as they can about our lifestyles, preferences, habits, insecurities, financials, and whatnot. We would hesitate to share such information with strangers on the sidewalk — but we have agreed to let corporations that time and again have proven to be untrustworthy, unreliable, and unethical, have access to our most private information and use this access to control our ideas and perceptions. And all of this happens under the long noses of our governments. But how is this unfair? The resourceful can afford an internet that works for them in ways they can control. They can pay for reliable VPN services that do not log their data or share it with third parties. They can pay for premium subscriptions to everyday services that do not show them advertisements to influence their choices. They can pay for smartphones that do not spy on them. They can pay for email services that do not look at their emails. They can pay for cloud storage that does not look at their private photos. They can pay for premium caller ID services that keep their identity private. This list goes on and on. Forget all of these — they can even buy and set up personal hardware to host their own services in a place with reliable power and internet connectivity. I can add more examples, but the bottom line is that those who can pay can choose to have an Internet that serves them. The rest of us who cannot afford thousands of dollars in subscriptions every year are left at the mercy of monopolies that do not care about us. Should we not demand from our governments that there be public services on the Internet that are free and fair? Should we not demand from our governments that there be regulations on what information can be harvested from us, and how it can be used? Should we not demand from our governments that there be minimum standards for services that are available to us? Unfortunately, this seems to be a lost cause. Government control of the Internet already sounds very, very scary — we know what can go wrong with that. But most, if not all, examples of what can go wrong have arisen from poor democracies. I imagine that, like how only strong democratic institutions can ensure fair distribution of other basic necessities, only solid democratic foundations can ensure such regulated and public access to the Internet. I don’t think we are at that stage yet in most parts of the world — but who knows, maybe we can still be hopeful and still demand it.]]></summary></entry><entry><title type="html">Do we write about suitcases? — Guest Post by Mayank Kumar</title><link href="https://mrigank.in/musings/2025/06/10/suitcases-mayank.html" rel="alternate" type="text/html" title="Do we write about suitcases? — Guest Post by Mayank Kumar" /><published>2025-06-10T00:00:00+00:00</published><updated>2025-06-10T00:00:00+00:00</updated><id>https://mrigank.in/musings/2025/06/10/suitcases-mayank</id><content type="html" xml:base="https://mrigank.in/musings/2025/06/10/suitcases-mayank.html"><![CDATA[<p>While this post was being penned, Bengaluru<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup> had relentless rains. The period from May to June here is usually associated with summer, and now it seems that we are truly out of sync. But you see, unpredictability has its own essence — and my suitcases<sup id="fnref:2"><a href="#fn:2" class="footnote" rel="footnote" role="doc-noteref">2</a></sup> have been a pinnacle in promoting it.</p>

<p>Before I move on to what exactly I mean, I think we must be able to assume that traveling bags and suitcases are interchangeable<sup id="fnref:3"><a href="#fn:3" class="footnote" rel="footnote" role="doc-noteref">3</a></sup>. Getting back, my suitcases are a source of true mystery. They aren’t a treasure chest to me — unless, of course, I am looking for a new shampoo bottle or a book I have long forgotten.</p>

<p>I have tried to reason a lot, but I have only been able to come to terms with my inability to understand what’s going on. I arrive at my hostel after a trip and check my stuff meticulously. Clothes? Check. Charger? Check. Sanity? Debatable. Therefore, I usually have a fine account of things<sup id="fnref:4"><a href="#fn:4" class="footnote" rel="footnote" role="doc-noteref">4</a></sup>. As an average student living in a hostel, I am at peace once everything is rearranged. It just so happens that whenever I need to get something (which I would have taken out of my suitcase and placed<sup id="fnref:5"><a href="#fn:5" class="footnote" rel="footnote" role="doc-noteref">5</a></sup> on my desk or in one of the racks, where I couldn’t find it), I must go back to my last resort, that is one of my suitcases. Somehow, it ends up there.</p>

<p>If I try to change my strategy of finding something in my suitcase first, it definitely won’t be there. I truly don’t understand why this happens, but it looks like they love playing magic tricks. While at that, I might present some statistics<sup id="fnref:6"><a href="#fn:6" class="footnote" rel="footnote" role="doc-noteref">6</a></sup>:</p>

<ul>
  <li>Azure is almost twice the volume of Prussian. However, if I go searching amongst these two bags, I would say I am likely to find my object of interest in Prussian around 65% of the time.</li>
  <li>I would say I have a preferential bias for choosing Azure while I search. Always started my last resort search with Azure, because my habit of reaching out to the wrong pocket is pronounced.</li>
  <li>Prussian, although smaller, has 3 compartments, of which two are empty. Azure has only 2 compartments, and they are filled.</li>
</ul>

<p>I have three and a half theories for this:</p>

<ul>
  <li>My belongings are in superposition and they hate me. They flip over to their other state in a suitcase when I try to find them.</li>
  <li>I am basically in an RPG<sup id="fnref:7"><a href="#fn:7" class="footnote" rel="footnote" role="doc-noteref">7</a></sup>, and everyone around me is an NPC<sup id="fnref:8"><a href="#fn:8" class="footnote" rel="footnote" role="doc-noteref">8</a></sup>. I have simply not realized that my suitcases are one too.</li>
  <li>I am not the only one who suffers from this within my hostel. There is a secret organization running within, which likes putting stuff back in the suitcase.</li>
  <li>I feel like I have completed the action of rearranging my belongings in my room, but I end up forgetting to place them. (Least plausible)</li>
</ul>

<p>Let me conclude with the fact that this phenomenon definitely needs to be investigated and could really turn into a good research subject. As I finish this, I am afraid something has traveled back into one of the suitcases. Will have to get back to check now.</p>

<p class="info">As indicated in the title, this post is a guest submission by Mayank Kumar and has been lightly edited by me. Mayank is my friend and a batchmate at IISc, where he is pursuing a major in Material Science (although I am well aware that his heart lies in Game Theory and related areas). Guest posts are a new feature on this blog, and I am excited to see how they evolve.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1">
      <p>My current place of residence — usually, when not underwater. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2">
      <p>Well, call them ‘Azure’ and ‘Prussian’. I would have liked to name ‘Prussian’ as ‘Prussian Blue’, but I am not typing another word for these pests. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3">
      <p>You must agree to this or one of us is dying on our way reading this post. It won’t be me. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4">
      <p>I believe that I don’t suffer from a goldfish memory. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5">
      <p>Read misplaced. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:6">
      <p>Useless stuff I wrote to make this post longer. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:7">
      <p>Role-playing game. <em>(Note from Mrigank)</em> <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:8">
      <p>Non-playable character, i.e., a character in the game whose actions are not controlled by the player. <em>(Note from Mrigank)</em> <a href="#fnref:8" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Mrigank Pawagi</name></author><category term="Guest Submission" /><category term="Essay" /><summary type="html"><![CDATA[While this post was being penned, Bengaluru1 had relentless rains. The period from May to June here is usually associated with summer, and now it seems that we are truly out of sync. But you see, unpredictability has its own essence — and my suitcases2 have been a pinnacle in promoting it. Before I move on to what exactly I mean, I think we must be able to assume that traveling bags and suitcases are interchangeable3. Getting back, my suitcases are a source of true mystery. They aren’t a treasure chest to me — unless, of course, I am looking for a new shampoo bottle or a book I have long forgotten. I have tried to reason a lot, but I have only been able to come to terms with my inability to understand what’s going on. I arrive at my hostel after a trip and check my stuff meticulously. Clothes? Check. Charger? Check. Sanity? Debatable. Therefore, I usually have a fine account of things4. As an average student living in a hostel, I am at peace once everything is rearranged. It just so happens that whenever I need to get something (which I would have taken out of my suitcase and placed5 on my desk or in one of the racks, where I couldn’t find it), I must go back to my last resort, that is one of my suitcases. Somehow, it ends up there. If I try to change my strategy of finding something in my suitcase first, it definitely won’t be there. I truly don’t understand why this happens, but it looks like they love playing magic tricks. While at that, I might present some statistics6: Azure is almost twice the volume of Prussian. However, if I go searching amongst these two bags, I would say I am likely to find my object of interest in Prussian around 65% of the time. I would say I have a preferential bias for choosing Azure while I search. Always started my last resort search with Azure, because my habit of reaching out to the wrong pocket is pronounced. Prussian, although smaller, has 3 compartments, of which two are empty. Azure has only 2 compartments, and they are filled. I have three and a half theories for this: My belongings are in superposition and they hate me. They flip over to their other state in a suitcase when I try to find them. I am basically in an RPG7, and everyone around me is an NPC8. I have simply not realized that my suitcases are one too. I am not the only one who suffers from this within my hostel. There is a secret organization running within, which likes putting stuff back in the suitcase. I feel like I have completed the action of rearranging my belongings in my room, but I end up forgetting to place them. (Least plausible) Let me conclude with the fact that this phenomenon definitely needs to be investigated and could really turn into a good research subject. As I finish this, I am afraid something has traveled back into one of the suitcases. Will have to get back to check now. As indicated in the title, this post is a guest submission by Mayank Kumar and has been lightly edited by me. Mayank is my friend and a batchmate at IISc, where he is pursuing a major in Material Science (although I am well aware that his heart lies in Game Theory and related areas). Guest posts are a new feature on this blog, and I am excited to see how they evolve. My current place of residence — usually, when not underwater. &#8617; Well, call them ‘Azure’ and ‘Prussian’. I would have liked to name ‘Prussian’ as ‘Prussian Blue’, but I am not typing another word for these pests. &#8617; You must agree to this or one of us is dying on our way reading this post. It won’t be me. &#8617; I believe that I don’t suffer from a goldfish memory. &#8617; Read misplaced. &#8617; Useless stuff I wrote to make this post longer. &#8617; Role-playing game. (Note from Mrigank) &#8617; Non-playable character, i.e., a character in the game whose actions are not controlled by the player. (Note from Mrigank) &#8617;]]></summary></entry><entry><title type="html">Reflections on GitHub Copilot as a partner in undergraduate research</title><link href="https://mrigank.in/musings/2025/06/09/github-copilot.html" rel="alternate" type="text/html" title="Reflections on GitHub Copilot as a partner in undergraduate research" /><published>2025-06-09T00:00:00+00:00</published><updated>2025-06-09T00:00:00+00:00</updated><id>https://mrigank.in/musings/2025/06/09/github-copilot</id><content type="html" xml:base="https://mrigank.in/musings/2025/06/09/github-copilot.html"><![CDATA[<p>I wasn’t following the AI hype before I joined college<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>, and while I had heard of GitHub Copilot<sup id="fnref:2"><a href="#fn:2" class="footnote" rel="footnote" role="doc-noteref">2</a></sup>, I didn’t think much of it. I had been programming for several years by then, and couldn’t imagine how AI could help me write code. I was not the only one skeptical; in particular, most educators still saw AI as another tool for plagiarism and were highly conservative regarding its use.</p>

<p>Fortunately for me<sup id="fnref:3"><a href="#fn:3" class="footnote" rel="footnote" role="doc-noteref">3</a></sup>, my first programming and algorithms course at IISc did not shy away from such developments. My professor, <a href="https://eecs.iisc.ac.in/people/prof-viraj-kumar">Viraj Kumar</a>, made Copilot an integral part of the course, focusing on two main aspects:</p>

<ol>
  <li>
    <p><strong>Writing good specifications</strong> that express our intent unambiguously, possibly using simple examples, to Copilot.</p>
  </li>
  <li>
    <p><strong>Reading and critiquing</strong> code produced by Copilot, to make sure it correctly reflects our intent.</p>
  </li>
</ol>

<p>As I will discuss in this blog post, these two aspects of working with AI have been instrumental in my own journey as a programmer and also an undergraduate researcher in software engineering. I will particularly focus on how Copilot has played the role of a collaborator and at times, a coach in helping me get started with undergraduate research. Of course, this was possible because of GitHub’s <a href="https://education.github.com/pack">Student Developer Pack</a><sup id="fnref:4"><a href="#fn:4" class="footnote" rel="footnote" role="doc-noteref">4</a></sup>.</p>

<h2 id="using-copilot-to-improve-specifications">Using Copilot to improve specifications</h2>

<p>My first undergraduate research project came out of discussions with Professor Viraj about how Copilot often suggests code that does not pass even on the test cases provided in the prompt. What was more interesting was that in some cases where the top suggestion was wrong, one of the other suggestions would be correct<sup id="fnref:5"><a href="#fn:5" class="footnote" rel="footnote" role="doc-noteref">5</a></sup>. This is not surprising for a language model trained to produce strings, not code, and for a system that evaluates it by treating its suggestions as merely strings and not as code with nuanced semantics. Note that our inputs to Copilot were incomplete Python function definitions with a docstring that included doctests, as shown in the below illustration, and we asked Copilot to write the body of the function.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">function_name</span><span class="p">(</span><span class="n">arg1</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">arg2</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">bool</span><span class="p">:</span>
<span class="err"> </span> <span class="err"> </span> <span class="sh">"""</span><span class="s">
    Purpose of the function.

    &gt;&gt;&gt; function_name(1, </span><span class="sh">"</span><span class="s">test</span><span class="sh">"</span><span class="s">) # doctest 1
    True

    &gt;&gt;&gt; function_name(2, </span><span class="sh">"</span><span class="s">test</span><span class="sh">"</span><span class="s">) # doctest 2
    False
    </span><span class="sh">"""</span>
</code></pre></div></div>

<p>Our first idea was to simply run all of Copilot’s suggestions through the provided test cases and filter out the ones that did not pass — and I built a simple VS Code extension to do this. One could first prompt Copilot to produce suggestions, and then trigger our extension to filter them. This was a good start, but we soon came across a more fundamental question. <em>Could better specifications have helped Copilot produce correct code more confidently?</em><sup id="fnref:12"><a href="#fn:12" class="footnote" rel="footnote" role="doc-noteref">6</a></sup></p>

<p>It seemed that the lack of confidence in its suggestions — i.e., the suggestions were functionally different from each other — was due to the ambiguity in the specification itself. Even more importantly, it seemed that these very differences in the suggestions could be used to reveal the ambiguities in the specification. My programming and algorithms course had focused a lot on writing good <em>purpose statements</em> for functions, and both Professor Viraj and I were excited to see if we could use such findings to build a tool that would not only alert programmers about ambiguous specifications but also help students learn how to write better specifications in the first place. At the same time, this can also encourage students to think more deeply about given requirements and ask the right <em>clarifying questions</em>, which is a skill that is often overlooked in programming courses.</p>

<p>This took shape as a tool called <a href="https://github.com/mrigankpawagi/GuardRails">GuardRails</a>, which we eventually <a href="https://dl.acm.org/doi/10.1145/3627217.3627234">presented</a> at the COMPUTE 2023 conference. The idea was rather simple<sup id="fnref:6"><a href="#fn:6" class="footnote" rel="footnote" role="doc-noteref">7</a></sup> — we would take all of Copilot’s suggestions, filter out the ones that did not pass the given test cases, and then run the remaining suggestions through several inputs<sup id="fnref:7"><a href="#fn:7" class="footnote" rel="footnote" role="doc-noteref">8</a></sup> to see if we could find an input where they behaved differently. If we did, this input, and potentially a class of similar inputs, would be a suspected source of ambiguity in the specification that we would then report to the user.</p>

<p>For instance, consider the following input to Copilot.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">common_letters</span><span class="p">(</span><span class="n">word1</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">word2</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">list</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
    <span class="sh">"""</span><span class="s">Return a list of letters common to word1 and word2.
</span><span class="gp">
    &gt;&gt;&gt;</span> <span class="nf">common_letters</span><span class="p">(</span><span class="sh">'</span><span class="s">cat</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">heart</span><span class="sh">'</span><span class="p">)</span>
    <span class="p">[</span><span class="sh">'</span><span class="s">a</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">t</span><span class="sh">'</span><span class="p">]</span>

<span class="gp">    &gt;&gt;&gt;</span> <span class="nf">common_letters</span><span class="p">(</span><span class="sh">'</span><span class="s">Dad</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Mom</span><span class="sh">'</span><span class="p">)</span>
    <span class="p">[]</span>
    <span class="sh">"""</span>
</code></pre></div></div>

<p>As you might be able to identify, there are many things which are unclear about this request. Does the order of letters in the output matter? If a common letter appears multiple times in both strings, should it appear multiple times in the output? Importantly, none of the doctests can clarify these. We observed that Copilot often produced suggestions that filled such ambiguities in different ways — and our tool could help identify these ambiguities by finding inputs (and usually the simplest such inputs) which could lead to different outputs under different interpretations.</p>

<p>Consider the following two suggestions for completing this function definition.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">suggestion1</span><span class="p">(</span><span class="n">word1</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">word2</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">list</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
    <span class="k">return</span> <span class="p">[</span><span class="n">c</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">word1</span> <span class="k">if</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">word2</span><span class="p">]</span>

<span class="k">def</span> <span class="nf">suggestion2</span><span class="p">(</span><span class="n">word1</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">word2</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">list</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
    <span class="k">return</span> <span class="p">[</span><span class="n">c</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">word2</span> <span class="k">if</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">word1</span><span class="p">]</span>
</code></pre></div></div>

<p>The fact that these two suggestions produce different outputs for the input strings <code class="language-plaintext highlighter-rouge">ab</code> and <code class="language-plaintext highlighter-rouge">ba</code> suggests an underspecification related to the order of letters in the output. Similarly, the following two suggestions differ in the treatment of duplicate letters, like when provided with the input strings <code class="language-plaintext highlighter-rouge">aa</code> and <code class="language-plaintext highlighter-rouge">aa</code>.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">suggestion3</span><span class="p">(</span><span class="n">word1</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">word2</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">list</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
    <span class="k">return</span> <span class="p">[</span><span class="n">c</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">word1</span> <span class="k">if</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">word2</span><span class="p">]</span>

<span class="k">def</span> <span class="nf">suggestion4</span><span class="p">(</span><span class="n">word1</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">word2</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">list</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
    <span class="k">return</span> <span class="p">[</span><span class="n">c</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="nf">set</span><span class="p">(</span><span class="n">word1</span><span class="p">)</span> <span class="k">if</span> <span class="n">c</span> <span class="ow">in</span> <span class="nf">set</span><span class="p">(</span><span class="n">word2</span><span class="p">)]</span>
</code></pre></div></div>

<p>The demonstration below shows this example in action. Our <a href="https://docs.google.com/spreadsheets/d/e/2PACX-1vRwmXlP8V6gbXtB1oQ5IUXfbRjW3eoCYKcbm-zN4uXphd_AK4Wj0CZzVmeXW4XvF2_scszdCD89CFpV/pubhtml?gid=0&amp;single=true">dataset</a> lists several other interesting examples.</p>

<div style="text-align: center;">
    <img src="https://raw.githubusercontent.com/mrigankpawagi/GuardRails/master/media/graphics/demo.gif" alt="A demonstration of GuardRails with the example discussed above." />
    <p>A quick demo of <a href="https://github.com/mrigankpawagi/GuardRails">GuardRails</a>. A free copy of our paper is available on <a href="https://arxiv.org/abs/2312.08189">arXiv</a>. Note that our tool does not work presently because of changes in the Copilot VS Code extension, and an update is underway. If you would like to contribute, please reach out to me, and I would be happy to discuss.</p>
</div>

<p>From a logistical perspective, this was a time when LLMs were still new, and free APIs were not as common as they are now. We wanted to quickly experiment with our ideas without having to first seek funding to self-host an LLM or get access to a paid API. Copilot was not meant to be used this way, but making our tool work on top of it was indeed a nice hacky way to quickly implement our ideas and test them out. Today, of course, LLMs are more accessible, and Copilot itself provides many ways to smoothly integrate additional functionality, like through <a href="https://docs.github.com/en/copilot/building-copilot-extensions">extensions</a> and even <a href="https://docs.github.com/en/copilot/building-copilot-extensions/building-a-copilot-agent-for-your-copilot-extension">agents</a>.</p>

<h2 id="validating-translated-code-at-scale-and-accidentally-learning-java">Validating translated code at scale and accidentally learning Java</h2>

<p>While working on GuardRails, Copilot’s role in my research was like that of a monkey in an animal behavior study. We analyzed its behavior, saw what it did correctly and what it did not, tried to understand why it did what it did, and then used that understanding to better guide its behavior. This was quite different from the colleague-like role it played in my next project, related to validating code translated from one programming language to another. I have talked about this work, called <em>GlueTest</em>, briefly in my <a href="https://mrigank.in/musings/2025/04/25/code-translation.html">previous blog post</a>. I had the opportunity to co-lead this work with the supervision of Professor Darko from UIUC and Professor Saikat from Cornell, and we were essentially trying to build a testing framework to validate whether code produced after translating an existing code base (in our case, in Java) to another language (in our case, Python) is indeed a correct translation. As I also discussed in my previous blog post, this has a few interesting challenges.</p>

<p>Most serious projects have developer-written tests, and it is natural to translate the test suite in the source language to the target language and use that to validate the translated code too — but it’s usually not that simple! Test code can be voluminous and occasionally complicated — and how do we know if the tests themselves were correctly translated? Incorrect test translation can lead to false negatives — where the translated code is actually correct, but the tests fail because of incorrect translation, thus undermining confidence in the translated code — ore even worse, false positives, where the translated code is incorrect but the tests pass, thus creating serious quality issues in the translated code base.</p>

<p>The other big challenge is that while translation efforts are usually incremental<sup id="fnref:8"><a href="#fn:8" class="footnote" rel="footnote" role="doc-noteref">9</a></sup>, tests often exercise too many parts of the code base at once, making it hard to translate small parts of the code base at a time and validate them. If larger parts of the code base are translated, it becomes harder to localize errors when a test fails, making debugging more time-consuming. <em>Partial translation</em> is a more practical approach that is followed in practice, where a project is translated fragment by fragment, and each fragment is validated before moving on to the next one. The following diagram illustrates this idea.</p>

<div style="text-align: center;">
    <img src="/musings/assets/images/2025-06-09-github-copilot/partial-translation.png" alt="An illustration of the partial translation approach." />
    <p>An illustration of the <i>partial translation</i> approach. The idea is a fragment-by-fragment translation of the project, validating each fragment before translating the next one.</p>
</div>

<p>Long story short — we proposed a framework to directly run tests in the source language on the translated code in the target language — with a <em>glue</em> layer that translates data on the boundary between the two languages. In our case of Java-to-Python translation, this meant that we would run the original Java tests, but redirect calls to the main Java code to the corresponding Python translation, which would then return the result back to the Java test. At the point where execution leaves Java and enters Python, we would convert the input Java objects to appropriate Python objects (like converting a Java <code class="language-plaintext highlighter-rouge">List</code> to a Python <code class="language-plaintext highlighter-rouge">list</code>). Similarly, when the execution returns from Python to Java, we would convert the output Python objects back to Java objects (like converting a Python <code class="language-plaintext highlighter-rouge">str</code> to a Java <code class="language-plaintext highlighter-rouge">String</code>). This intermediate conversion layer is what we call the <em>glue</em> layer — and it actually does a lot more than this, like maintaining pointer identity, propagating side effects, transmitting exceptions, and so on. The following example shows how this works in practice for a developer using our framework.</p>

<div style="text-align: center;">
    <img src="/musings/assets/images/2025-06-09-github-copilot/gluetest-example.png" />
    <p>An illustration of our <i>GlueTest</i> approach. What is happening here is that we create a <i> Python representation</i> of every Java object, and when a Java test calls a method in the Java main code, we redirect that call to the corresponding Python representation, which implements the same method in Python. As the execution flows between the two languages, the glue layer converts objects between the two languages. A freely available preprint of our work is available <a href="https://www.cs.cornell.edu/~saikatd/papers/gluetest-icsme-nier24.pdf">here</a>.</p>
</div>

<p>Where does Copilot come into this picture? In order to evaluate our approach, we manually translated two Java libraries<sup id="fnref:9"><a href="#fn:9" class="footnote" rel="footnote" role="doc-noteref">10</a></sup> to Python — and well, this was my first time writing Java code. Yet I managed to become one of the main contributors to the project, thanks to Copilot. I would translate the code class by class, and Copilot would give me a reasonably good template to start with. Usually, I would modify the translation for one of the methods in the class to fit my needs, and then Copilot would very quickly adapt to that style and structure. Of course, Copilot would make mistakes, especially in things with subtle differences between the two languages, but my work was limited to finding solutions to these tricky<sup id="fnref:11"><a href="#fn:11" class="footnote" rel="footnote" role="doc-noteref">11</a></sup> problems instead of doing the grunt work of writing obvious code. And in all of this, I found myself slowly picking up quite a significant amount of Java<sup id="fnref:10"><a href="#fn:10" class="footnote" rel="footnote" role="doc-noteref">12</a></sup>. I am not saying that I am now a Java programmer, but I could see during my other projects and also some of my classes at University that my understanding of Java and its nuances has been coming in handy.</p>

<h2 id="conclusion">Conclusion</h2>

<p>The bottom line here is that I can concretely see that Copilot makes me much faster at producing code. It does seem very magical, and at times, it might even feel like I am cheating my way through. But at the end of the day, it is another tool in addition to the plethora of linters, snippets, auto-completions, and other tools that our code editors have provided us for a long time. And like any of these tools, it is not a replacement for good programming practices<sup id="fnref:13"><a href="#fn:13" class="footnote" rel="footnote" role="doc-noteref">13</a></sup>. As I discussed above, it is also not a replacement for actually learning to write code! I don’t think it is really a question anymore of whether or not we should use AI for programming — and so we should instead think more deeply about how we can use it effectively to complement our existing skills and tooling, what its limitations are, how far can we get around these limitations with additional infrastructure, and how we can use it to make coding more accessible not just to novices but even to experienced programmers who just want to have a little bit of fun!</p>

<p class="info"><strong>New</strong>: This blog post was featured in the Summer 2025 edition of the GitHub Education Student Newsletter.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1">
      <p>Even when I did become interested, I gave up on trying to keep up with the latest news only within a couple of months because it was simply too much too fast. Now, I only read up when something catches my attention or when I need to know something specific. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2">
      <p>Which had been released less than 4 months before, in June 2022, for the public. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3">
      <p>At least retrospectively. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4">
      <p>As a side note, I must acknowledge that the Student Developer Pack is a genius move by GitHub to get students hooked to Copilot and other tools. I cannot imagine working without Copilot now, and have decided to continue studying through grad school for profiting from a free GitHub Copilot Pro license 😉. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:5">
      <p>We could press <code class="language-plaintext highlighter-rouge">Ctrl+Enter</code> in VS Code to see the list of up to 10 top suggestions. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:12">
      <p>Where confidence in a suggestion can be thought of as the fraction of times the same, or an equivalent, suggestion is produced by Copilot. If Copilot produces the same suggestion multiple times, we can interpret that Copilot is <em>confident</em> about that suggestion. <a href="#fnref:12" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:6">
      <p>And had been explored in somewhat other contexts before, like in <a href="https://www.microsoft.com/en-us/research/publication/interactive-code-generation-via-test-driven-user-intent-formalization/">this paper</a>. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:7">
      <p>To be precise, we performed differential fuzzing on every pair of these suggestions. This means that we would take two suggestions, and run them against the same set of test cases to find inputs where they produced different outputs. These inputs are selected randomly, based on the type of the arguments expected by the function, with several heuristics in place to ensure that common edge cases (like empty lists, negative numbers, single-character strings, etc.) are not missed. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:8">
      <p><em>All-or-nothing</em> translations have been attempted, but as Terekhov and Verhoef mention in their <a href="https://ieeexplore.ieee.org/document/895180">article</a>, this has been disastrous in practice — from abandoning entire projects to even bankruptcies. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:9">
      <p><a href="https://github.com/apache/commons-cli">Apache Commons CLI</a> and <a href="https://github.com/apache/commons-csv">Apache Commons CSV</a>. <a href="#fnref:9" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:11">
      <p>And arguably more interesting. I plan to share some of the insights from these problems in future blog posts. <a href="#fnref:11" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:10">
      <p>This deserves another blog post, but I believe I could truly appreciate the beauty of object-oriented programming only after working with Java. This is not only about how nice the code looks but also about how it enforces a certain way of thinking about code and writing it. <a href="#fnref:10" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:13">
      <p>Neither is it an excuse for not adhering to responsible practices around security and fair use. Since Copilot is trained on publicly available code, it is important to ensure that we do not accidentally violate software licenses or copy code without proper attribution. Similarly, it is crucial to review and understand AI-generated code to avoid potential security vulnerabilities and ensure that sensitive information is not inadvertently exposed. <a href="#fnref:13" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Mrigank Pawagi</name></author><category term="Programming" /><category term="Research" /><category term="Essay" /><summary type="html"><![CDATA[I wasn’t following the AI hype before I joined college1, and while I had heard of GitHub Copilot2, I didn’t think much of it. I had been programming for several years by then, and couldn’t imagine how AI could help me write code. I was not the only one skeptical; in particular, most educators still saw AI as another tool for plagiarism and were highly conservative regarding its use. Fortunately for me3, my first programming and algorithms course at IISc did not shy away from such developments. My professor, Viraj Kumar, made Copilot an integral part of the course, focusing on two main aspects: Writing good specifications that express our intent unambiguously, possibly using simple examples, to Copilot. Reading and critiquing code produced by Copilot, to make sure it correctly reflects our intent. As I will discuss in this blog post, these two aspects of working with AI have been instrumental in my own journey as a programmer and also an undergraduate researcher in software engineering. I will particularly focus on how Copilot has played the role of a collaborator and at times, a coach in helping me get started with undergraduate research. Of course, this was possible because of GitHub’s Student Developer Pack4. Using Copilot to improve specifications My first undergraduate research project came out of discussions with Professor Viraj about how Copilot often suggests code that does not pass even on the test cases provided in the prompt. What was more interesting was that in some cases where the top suggestion was wrong, one of the other suggestions would be correct5. This is not surprising for a language model trained to produce strings, not code, and for a system that evaluates it by treating its suggestions as merely strings and not as code with nuanced semantics. Note that our inputs to Copilot were incomplete Python function definitions with a docstring that included doctests, as shown in the below illustration, and we asked Copilot to write the body of the function. def function_name(arg1: int, arg2: str) -&gt; bool:     """ Purpose of the function.     &gt;&gt;&gt; function_name(1, "test") # doctest 1 True     &gt;&gt;&gt; function_name(2, "test") # doctest 2 False """ Our first idea was to simply run all of Copilot’s suggestions through the provided test cases and filter out the ones that did not pass — and I built a simple VS Code extension to do this. One could first prompt Copilot to produce suggestions, and then trigger our extension to filter them. This was a good start, but we soon came across a more fundamental question. Could better specifications have helped Copilot produce correct code more confidently?6 It seemed that the lack of confidence in its suggestions — i.e., the suggestions were functionally different from each other — was due to the ambiguity in the specification itself. Even more importantly, it seemed that these very differences in the suggestions could be used to reveal the ambiguities in the specification. My programming and algorithms course had focused a lot on writing good purpose statements for functions, and both Professor Viraj and I were excited to see if we could use such findings to build a tool that would not only alert programmers about ambiguous specifications but also help students learn how to write better specifications in the first place. At the same time, this can also encourage students to think more deeply about given requirements and ask the right clarifying questions, which is a skill that is often overlooked in programming courses. This took shape as a tool called GuardRails, which we eventually presented at the COMPUTE 2023 conference. The idea was rather simple7 — we would take all of Copilot’s suggestions, filter out the ones that did not pass the given test cases, and then run the remaining suggestions through several inputs8 to see if we could find an input where they behaved differently. If we did, this input, and potentially a class of similar inputs, would be a suspected source of ambiguity in the specification that we would then report to the user. For instance, consider the following input to Copilot. def common_letters(word1: str, word2: str) -&gt; list[str]: """Return a list of letters common to word1 and word2. &gt;&gt;&gt; common_letters('cat', 'heart') ['a', 't'] &gt;&gt;&gt; common_letters('Dad', 'Mom') [] """ As you might be able to identify, there are many things which are unclear about this request. Does the order of letters in the output matter? If a common letter appears multiple times in both strings, should it appear multiple times in the output? Importantly, none of the doctests can clarify these. We observed that Copilot often produced suggestions that filled such ambiguities in different ways — and our tool could help identify these ambiguities by finding inputs (and usually the simplest such inputs) which could lead to different outputs under different interpretations. Consider the following two suggestions for completing this function definition. def suggestion1(word1: str, word2: str) -&gt; list[str]: return [c for c in word1 if c in word2] def suggestion2(word1: str, word2: str) -&gt; list[str]: return [c for c in word2 if c in word1] The fact that these two suggestions produce different outputs for the input strings ab and ba suggests an underspecification related to the order of letters in the output. Similarly, the following two suggestions differ in the treatment of duplicate letters, like when provided with the input strings aa and aa. def suggestion3(word1: str, word2: str) -&gt; list[str]: return [c for c in word1 if c in word2] def suggestion4(word1: str, word2: str) -&gt; list[str]: return [c for c in set(word1) if c in set(word2)] The demonstration below shows this example in action. Our dataset lists several other interesting examples.         A quick demo of GuardRails. A free copy of our paper is available on arXiv. Note that our tool does not work presently because of changes in the Copilot VS Code extension, and an update is underway. If you would like to contribute, please reach out to me, and I would be happy to discuss. From a logistical perspective, this was a time when LLMs were still new, and free APIs were not as common as they are now. We wanted to quickly experiment with our ideas without having to first seek funding to self-host an LLM or get access to a paid API. Copilot was not meant to be used this way, but making our tool work on top of it was indeed a nice hacky way to quickly implement our ideas and test them out. Today, of course, LLMs are more accessible, and Copilot itself provides many ways to smoothly integrate additional functionality, like through extensions and even agents. Validating translated code at scale and accidentally learning Java While working on GuardRails, Copilot’s role in my research was like that of a monkey in an animal behavior study. We analyzed its behavior, saw what it did correctly and what it did not, tried to understand why it did what it did, and then used that understanding to better guide its behavior. This was quite different from the colleague-like role it played in my next project, related to validating code translated from one programming language to another. I have talked about this work, called GlueTest, briefly in my previous blog post. I had the opportunity to co-lead this work with the supervision of Professor Darko from UIUC and Professor Saikat from Cornell, and we were essentially trying to build a testing framework to validate whether code produced after translating an existing code base (in our case, in Java) to another language (in our case, Python) is indeed a correct translation. As I also discussed in my previous blog post, this has a few interesting challenges. Most serious projects have developer-written tests, and it is natural to translate the test suite in the source language to the target language and use that to validate the translated code too — but it’s usually not that simple! Test code can be voluminous and occasionally complicated — and how do we know if the tests themselves were correctly translated? Incorrect test translation can lead to false negatives — where the translated code is actually correct, but the tests fail because of incorrect translation, thus undermining confidence in the translated code — ore even worse, false positives, where the translated code is incorrect but the tests pass, thus creating serious quality issues in the translated code base. The other big challenge is that while translation efforts are usually incremental9, tests often exercise too many parts of the code base at once, making it hard to translate small parts of the code base at a time and validate them. If larger parts of the code base are translated, it becomes harder to localize errors when a test fails, making debugging more time-consuming. Partial translation is a more practical approach that is followed in practice, where a project is translated fragment by fragment, and each fragment is validated before moving on to the next one. The following diagram illustrates this idea.         An illustration of the partial translation approach. The idea is a fragment-by-fragment translation of the project, validating each fragment before translating the next one. Long story short — we proposed a framework to directly run tests in the source language on the translated code in the target language — with a glue layer that translates data on the boundary between the two languages. In our case of Java-to-Python translation, this meant that we would run the original Java tests, but redirect calls to the main Java code to the corresponding Python translation, which would then return the result back to the Java test. At the point where execution leaves Java and enters Python, we would convert the input Java objects to appropriate Python objects (like converting a Java List to a Python list). Similarly, when the execution returns from Python to Java, we would convert the output Python objects back to Java objects (like converting a Python str to a Java String). This intermediate conversion layer is what we call the glue layer — and it actually does a lot more than this, like maintaining pointer identity, propagating side effects, transmitting exceptions, and so on. The following example shows how this works in practice for a developer using our framework.         An illustration of our GlueTest approach. What is happening here is that we create a Python representation of every Java object, and when a Java test calls a method in the Java main code, we redirect that call to the corresponding Python representation, which implements the same method in Python. As the execution flows between the two languages, the glue layer converts objects between the two languages. A freely available preprint of our work is available here. Where does Copilot come into this picture? In order to evaluate our approach, we manually translated two Java libraries10 to Python — and well, this was my first time writing Java code. Yet I managed to become one of the main contributors to the project, thanks to Copilot. I would translate the code class by class, and Copilot would give me a reasonably good template to start with. Usually, I would modify the translation for one of the methods in the class to fit my needs, and then Copilot would very quickly adapt to that style and structure. Of course, Copilot would make mistakes, especially in things with subtle differences between the two languages, but my work was limited to finding solutions to these tricky11 problems instead of doing the grunt work of writing obvious code. And in all of this, I found myself slowly picking up quite a significant amount of Java12. I am not saying that I am now a Java programmer, but I could see during my other projects and also some of my classes at University that my understanding of Java and its nuances has been coming in handy. Conclusion The bottom line here is that I can concretely see that Copilot makes me much faster at producing code. It does seem very magical, and at times, it might even feel like I am cheating my way through. But at the end of the day, it is another tool in addition to the plethora of linters, snippets, auto-completions, and other tools that our code editors have provided us for a long time. And like any of these tools, it is not a replacement for good programming practices13. As I discussed above, it is also not a replacement for actually learning to write code! I don’t think it is really a question anymore of whether or not we should use AI for programming — and so we should instead think more deeply about how we can use it effectively to complement our existing skills and tooling, what its limitations are, how far can we get around these limitations with additional infrastructure, and how we can use it to make coding more accessible not just to novices but even to experienced programmers who just want to have a little bit of fun! New: This blog post was featured in the Summer 2025 edition of the GitHub Education Student Newsletter. Even when I did become interested, I gave up on trying to keep up with the latest news only within a couple of months because it was simply too much too fast. Now, I only read up when something catches my attention or when I need to know something specific. &#8617; Which had been released less than 4 months before, in June 2022, for the public. &#8617; At least retrospectively. &#8617; As a side note, I must acknowledge that the Student Developer Pack is a genius move by GitHub to get students hooked to Copilot and other tools. I cannot imagine working without Copilot now, and have decided to continue studying through grad school for profiting from a free GitHub Copilot Pro license 😉. &#8617; We could press Ctrl+Enter in VS Code to see the list of up to 10 top suggestions. &#8617; Where confidence in a suggestion can be thought of as the fraction of times the same, or an equivalent, suggestion is produced by Copilot. If Copilot produces the same suggestion multiple times, we can interpret that Copilot is confident about that suggestion. &#8617; And had been explored in somewhat other contexts before, like in this paper. &#8617; To be precise, we performed differential fuzzing on every pair of these suggestions. This means that we would take two suggestions, and run them against the same set of test cases to find inputs where they produced different outputs. These inputs are selected randomly, based on the type of the arguments expected by the function, with several heuristics in place to ensure that common edge cases (like empty lists, negative numbers, single-character strings, etc.) are not missed. &#8617; All-or-nothing translations have been attempted, but as Terekhov and Verhoef mention in their article, this has been disastrous in practice — from abandoning entire projects to even bankruptcies. &#8617; Apache Commons CLI and Apache Commons CSV. &#8617; And arguably more interesting. I plan to share some of the insights from these problems in future blog posts. &#8617; This deserves another blog post, but I believe I could truly appreciate the beauty of object-oriented programming only after working with Java. This is not only about how nice the code looks but also about how it enforces a certain way of thinking about code and writing it. &#8617; Neither is it an excuse for not adhering to responsible practices around security and fair use. Since Copilot is trained on publicly available code, it is important to ensure that we do not accidentally violate software licenses or copy code without proper attribution. Similarly, it is crucial to review and understand AI-generated code to avoid potential security vulnerabilities and ensure that sensitive information is not inadvertently exposed. &#8617;]]></summary></entry><entry><title type="html">Our code translation work is appearing in FSE 2025! But why are we translating code?</title><link href="https://mrigank.in/musings/2025/04/25/code-translation.html" rel="alternate" type="text/html" title="Our code translation work is appearing in FSE 2025! But why are we translating code?" /><published>2025-04-25T00:00:00+00:00</published><updated>2025-04-25T00:00:00+00:00</updated><id>https://mrigank.in/musings/2025/04/25/code-translation</id><content type="html" xml:base="https://mrigank.in/musings/2025/04/25/code-translation.html"><![CDATA[<p class="info"><strong>Note</strong>: Starting from this post, I will no longer be sharing any updates on Twitter or LinkedIn. All updates, like this one, will be posted here on this blog.</p>

<p>My work at UIUC, <em>AlphaTrans: A Neuro-Symbolic Compositional Approach for Repository-Level Code Translation and Validation</em>, has been <a href="https://conf.researchr.org/details/fse-2025/fse-2025-research-papers/67/AlphaTrans-A-Neuro-Symbolic-Compositional-Approach-for-Repository-Level-Code-Transla">accepted</a> to appear at the <a href="https://conf.researchr.org/home/fse-2025">FSE 2025</a> conference in the <em>Research Papers</em> track. This work was led by <a href="https://alirezai.cs.illinois.edu">Ali Reza Ibrahimzada</a> who is a PhD student at UIUC. Professor <a href="https://reyhaneh.cs.illinois.edu/">Reyhaneh Jabbarvand</a> facilitated this project as a collaboration between her group and IBM Research. Our work presents an automated solution for translating entire code repositories from one programming language to another. We essentially use LLMs to generate these translations and apply a range of validation techniques to ensure the semantic equivalence of the generated code with the original code.</p>

<p>I am also particularly grateful to Professor Darko Marinov at UIUC for involving me in this project and for hosting me as a visiting student researcher at UIUC during the summer of 2024<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>. Special mention also goes to <a href="https://msalman-abid.github.io/">Salman Abid</a>, who is now a PhD student at Cornell University. Despite his busy schedule due to his preparations for moving to the US, he was regularly available to help me with challenges related to GraalVM and the Polyglot API, besides being an ear to me beyond the technical aspects of the project.</p>

<div style="text-align: center;"><!--
    --><img src="/musings/assets/images/2025-04-25-code-translation/flowchart.png" /><!--
    --><p>A preprint of our work is available on <a href="https://arxiv.org/abs/2410.24117">arXiv</a>. Our tool is available on <a href="https://github.com/Intelligent-CAT-Lab/AlphaTrans">GitHub</a>.</p>
</div>

<p>This project was my first <em>big</em> project. And <em>big</em> it was. I started working sometime in January last year and was still applying fixes to my system in December. We <a href="https://mrigank.in/musings/2024/08/17/call-off-deadlines.html">missed</a> our planned ICSE 2025 deadline in August and submitted to FSE 2025 in September. Over more than 7 months since then, we submitted a rebuttal and then a major revision before finally hearing back in April. More importantly, this was the first time that I saw an actual gap, and that too a significant one, in existing literature, and that I worked on pretty much the cutting edge of research in the area. While work on code translation goes back to the 1960s, LLMs have brought renewed interest to it, with many notable papers demonstrating their potential in this context. However, previous work more or less remains limited to translating small snippets of code, and automated whole-project code translation continues to be a challenge. Advances in the area are valuable because code translation is a practical problem that software-based companies face on a large scale. Old technologies become obsolete, and it becomes increasingly difficult to find developers who can maintain legacy code. The challenge is huge for real code because the code is voluminous, spread across a large number of files, has complex inter- and intra-file dependencies, and often utilizes many third-party libraries. Either way, this is a fast-moving area, and indeed, just as we were trying to create the state-of-the-art, several related works<sup id="fnref:2"><a href="#fn:2" class="footnote" rel="footnote" role="doc-noteref">2</a></sup> were coming up as well.</p>

<p>We demonstrated our approach by translating several open-source Java projects to Python, and my particular contribution was to create a system for validating the Python translations of individual Java methods. It is natural to translate the test suite of the original Java project to Python and use the test suite to validate the translations. However, things are far from straightforward. First of all is the question of how one would validate the test translation itself — after all, there are no <em>tests</em> for the tests. The test code is indeed much simpler than the main code, at least in logical complexity, and one could manually validate the test translations. But even then, tests can call several methods in the main code, and all of them would have to be translated before any test could even be executed. In a setting where we wish to translate the code step by step, say a method-at-a-time, this postpones validation and makes it harder to localize and fix bugs. Bug localization becomes much easier if validation could be performed at each step so that we know which method exactly causes tests to fail. My system uses GraalVM and its <a href="https://www.graalvm.org/latest/reference-manual/polyglot-programming/">Polyglot API</a> to bridge Java tests and translated Python code. One can think of this as almost literally replacing the code of a Java method with its corresponding Python translation. Yes, some objects have to be passed between the two languages, and types have to be taken care of, but that is all engineering. The system is not perfect, but it works reasonably well even for real-world projects and caters to much more complicated types than those in recent related work. This particular aspect of the work is a follow-up on my initial efforts in creating such a validation system in my previous work, <a href="https://ieeexplore.ieee.org/document/10794998">GlueTest</a><sup id="fnref:3"><a href="#fn:3" class="footnote" rel="footnote" role="doc-noteref">3</a></sup>, which I carried out during 2023 with Salman and several other undergraduate students from around the world<sup id="fnref:4"><a href="#fn:4" class="footnote" rel="footnote" role="doc-noteref">4</a></sup> along with Professor Darko and Professor <a href="https://www.cs.cornell.edu/~saikatd/">Saikat Dutta</a>, who is now at Cornell University.</p>

<p>Code translation is an exciting topic because it encompasses a wide range of problems in software engineering, right from specifications (what is a <em>correct</em> translation without a notion of <em>correctness</em>?) to testing (how do we check for the correctness of a translation?), and from program repair (how do we fix bugs in a translation?) to refactoring (how do we make a translation more idiomatic?). There’s a long way to some of these answers, but with LLMs combined with traditional software engineering expertise, we seem to be slowly getting there. Maybe in the future, it wouldn’t matter what programming language you use, and you would have a universal translator between all programming languages so that you could use whichever suits your taste. Until then, let me go back to cursing Java while I try to work through my course project for a distributed software class.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1">
      <p>My visit has featured in several posts on this blog, including <a href="https://mrigank.in/musings/2024/05/24/narrow-slice-of-reality.html">this</a> and <a href="https://mrigank.in/musings/2024/07/18/universities.html">this</a>, but also implicitly in terms of my technical (like <a href="https://mrigank.in/musings/2024/06/16/intercepting-attribute-accesses-in-python.html">this</a> or <a href="https://mrigank.in/musings/2024/06/19/not-my-problem.html">this</a>) and philosophical (like <a href="https://mrigank.in/musings/2024/06/15/on-research.html">this</a>) insights from that time. Let’s see if I can continue writing in the upcoming summer as well 😉. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2">
      <p>A highly non-exhaustive list in no particular order includes <a href="https://arxiv.org/abs/2412.08035">this</a>, <a href="https://arxiv.org/abs/2412.14234">this</a> and <a href="https://arxiv.org/abs/2501.14257">this</a>. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3">
      <p>A freely available preprint is available <a href="https://www.cs.cornell.edu/~saikatd/papers/gluetest-icsme-nier24.pdf">here</a>. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4">
      <p>This was as part of the UIUC+ Software Engineering Summer Research program. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Mrigank Pawagi</name></author><category term="News" /><category term="Research" /><summary type="html"><![CDATA[Note: Starting from this post, I will no longer be sharing any updates on Twitter or LinkedIn. All updates, like this one, will be posted here on this blog. My work at UIUC, AlphaTrans: A Neuro-Symbolic Compositional Approach for Repository-Level Code Translation and Validation, has been accepted to appear at the FSE 2025 conference in the Research Papers track. This work was led by Ali Reza Ibrahimzada who is a PhD student at UIUC. Professor Reyhaneh Jabbarvand facilitated this project as a collaboration between her group and IBM Research. Our work presents an automated solution for translating entire code repositories from one programming language to another. We essentially use LLMs to generate these translations and apply a range of validation techniques to ensure the semantic equivalence of the generated code with the original code. I am also particularly grateful to Professor Darko Marinov at UIUC for involving me in this project and for hosting me as a visiting student researcher at UIUC during the summer of 20241. Special mention also goes to Salman Abid, who is now a PhD student at Cornell University. Despite his busy schedule due to his preparations for moving to the US, he was regularly available to help me with challenges related to GraalVM and the Polyglot API, besides being an ear to me beyond the technical aspects of the project. A preprint of our work is available on arXiv. Our tool is available on GitHub. This project was my first big project. And big it was. I started working sometime in January last year and was still applying fixes to my system in December. We missed our planned ICSE 2025 deadline in August and submitted to FSE 2025 in September. Over more than 7 months since then, we submitted a rebuttal and then a major revision before finally hearing back in April. More importantly, this was the first time that I saw an actual gap, and that too a significant one, in existing literature, and that I worked on pretty much the cutting edge of research in the area. While work on code translation goes back to the 1960s, LLMs have brought renewed interest to it, with many notable papers demonstrating their potential in this context. However, previous work more or less remains limited to translating small snippets of code, and automated whole-project code translation continues to be a challenge. Advances in the area are valuable because code translation is a practical problem that software-based companies face on a large scale. Old technologies become obsolete, and it becomes increasingly difficult to find developers who can maintain legacy code. The challenge is huge for real code because the code is voluminous, spread across a large number of files, has complex inter- and intra-file dependencies, and often utilizes many third-party libraries. Either way, this is a fast-moving area, and indeed, just as we were trying to create the state-of-the-art, several related works2 were coming up as well. We demonstrated our approach by translating several open-source Java projects to Python, and my particular contribution was to create a system for validating the Python translations of individual Java methods. It is natural to translate the test suite of the original Java project to Python and use the test suite to validate the translations. However, things are far from straightforward. First of all is the question of how one would validate the test translation itself — after all, there are no tests for the tests. The test code is indeed much simpler than the main code, at least in logical complexity, and one could manually validate the test translations. But even then, tests can call several methods in the main code, and all of them would have to be translated before any test could even be executed. In a setting where we wish to translate the code step by step, say a method-at-a-time, this postpones validation and makes it harder to localize and fix bugs. Bug localization becomes much easier if validation could be performed at each step so that we know which method exactly causes tests to fail. My system uses GraalVM and its Polyglot API to bridge Java tests and translated Python code. One can think of this as almost literally replacing the code of a Java method with its corresponding Python translation. Yes, some objects have to be passed between the two languages, and types have to be taken care of, but that is all engineering. The system is not perfect, but it works reasonably well even for real-world projects and caters to much more complicated types than those in recent related work. This particular aspect of the work is a follow-up on my initial efforts in creating such a validation system in my previous work, GlueTest3, which I carried out during 2023 with Salman and several other undergraduate students from around the world4 along with Professor Darko and Professor Saikat Dutta, who is now at Cornell University. Code translation is an exciting topic because it encompasses a wide range of problems in software engineering, right from specifications (what is a correct translation without a notion of correctness?) to testing (how do we check for the correctness of a translation?), and from program repair (how do we fix bugs in a translation?) to refactoring (how do we make a translation more idiomatic?). There’s a long way to some of these answers, but with LLMs combined with traditional software engineering expertise, we seem to be slowly getting there. Maybe in the future, it wouldn’t matter what programming language you use, and you would have a universal translator between all programming languages so that you could use whichever suits your taste. Until then, let me go back to cursing Java while I try to work through my course project for a distributed software class. My visit has featured in several posts on this blog, including this and this, but also implicitly in terms of my technical (like this or this) and philosophical (like this) insights from that time. Let’s see if I can continue writing in the upcoming summer as well 😉. &#8617; A highly non-exhaustive list in no particular order includes this, this and this. &#8617; A freely available preprint is available here. &#8617; This was as part of the UIUC+ Software Engineering Summer Research program. &#8617;]]></summary></entry><entry><title type="html">Logging Out of WhatsApp, Logging In to Life</title><link href="https://mrigank.in/musings/2025/04/20/whatsapp.html" rel="alternate" type="text/html" title="Logging Out of WhatsApp, Logging In to Life" /><published>2025-04-20T00:00:00+00:00</published><updated>2025-04-20T00:00:00+00:00</updated><id>https://mrigank.in/musings/2025/04/20/whatsapp</id><content type="html" xml:base="https://mrigank.in/musings/2025/04/20/whatsapp.html"><![CDATA[<p>This blog is different from others in that it is in the format of a conversation. I really did have this conversation — with ChatGPT<sup id="fnref:1"><a href="#fn:1" class="footnote" rel="footnote" role="doc-noteref">1</a></sup>. I have only lightly edited its responses and have left my own responses nearly verbatim. I think this is an interesting way to present my thoughts and I might try more of this in the future<sup id="fnref:2"><a href="#fn:2" class="footnote" rel="footnote" role="doc-noteref">2</a></sup>.</p>

<p><strong>Host (ChatGPT):</strong> In a world where messages ping louder than thoughts and replies come faster than reflection, stepping away from a platform like WhatsApp isn’t just a technical decision — it’s a personal rebellion. Today, I will sit down with Mrigank who did what many only contemplate: he quit WhatsApp. Five months in, he shares the psychological weight, social friction, and digital clarity that came with that choice.</p>

<hr />

<p><strong>Host:</strong> <strong>Mrigank, you’ve thought about leaving WhatsApp many times before, but it wasn’t until recently that you actually did it. What finally pushed you to pull the plug?</strong></p>

<p><strong>Mrigank:</strong> To be fair, I have at least thought about this many times before. It is absurd how I have been able to take breaks, whenever I wanted, from other social media platforms like Instagram, Facebook, or LinkedIn — but not even once could I stop using WhatsApp since I first started using it at around 11 years old. The reason is that it’s just so difficult not to use WhatsApp! Doesn’t it sound absurd? But it is true. This time I decided not to think too much and just do it — and it’s been five months already! The reasons for this have been pretty clear to me for quite a while, and I hope we will delve into them in this conversation. But as such, there was no special reason this time. I just wanted to experiment and see for myself if it was possible to live without WhatsApp.</p>

<hr />

<p><strong>Host:</strong> <strong>It’s interesting how certain platforms become so ingrained in our daily lives that stepping away feels almost impossible. You noted that while you’ve taken breaks from other social media platforms like Instagram, Facebook, and LinkedIn, WhatsApp was different. Could you delve deeper into what made WhatsApp uniquely challenging to leave compared to other platforms?​</strong></p>

<p><strong>Mrigank:</strong> The simple answer is that WhatsApp has become the de facto means of communication in our lives. I agree that WhatsApp has brought immense convenience to our lives. People use WhatsApp to connect with their friends and family almost effortlessly. And it is very effective at what it does.</p>

<p>But this convenience has, in some sense, come back to bite us — and unfortunately, most people don’t even realize this. WhatsApp has started to be overused in many ways. It has become a part of our lives beyond what it needs to be. This could be school, university, or even work. In my school, for example, we would have several WhatsApp groups every year, including groups between students, parents, and teachers. Sometimes even groups for different classes, groups for extra-curricular activities, groups for student council, and whatnot. If you were not on WhatsApp, it would be impossible to keep up with what was going on. You might even miss important things because many announcements and discussions would be limited to WhatsApp alone. I have been in university for the last two and a half years, and it’s almost the same story here. There is a WhatsApp group for the entire batch, a group for B.Tech. students, a group for every student club, a common group for all clubs, a group for every upcoming event in the student clubs, a group for every hostel, a group for every mess — the list simply goes on when you think about it. Some time ago, even my university’s placement cell started a WhatsApp group with the students to send updates about the placement process and job opportunities.</p>

<p>Is WhatsApp truly the best way to communicate all these things? We have email, Microsoft Teams (my university has a subscription), and other platforms that are meant for such communication. There is a range of features on these platforms to create groups and sub-groups of people, and they are meant for this. I will want to emphasize email here. Do we even know how to effectively use email? It is an incredibly powerful tool and so much more reliable and effective for these use cases. We continue to use WhatsApp and allow it to slowly seep into our lives beyond what is healthy.</p>

<hr />

<p><strong>Host:</strong> <strong>What specific features, design decisions, or policies of WhatsApp did you find most problematic? Was it privacy and encryption concerns, the way groups are managed, notification overload, data‑sharing with Facebook, or something else entirely? How did those drive your decision to walk away?</strong></p>

<p><strong>Mrigank:</strong> Privacy, yes, is one of the concerns. But certainly not the most important one. Much more scary to me are some of the psychological concerns — we will get to that. After all, as far as our chats themselves are concerned, WhatsApp seems to be end-to-end encrypted. But the problem with privacy is more than this. For example, I have noticed in the past that somehow people I have never seen on Facebook, never searched their names, and have no mutual friends with, magically pop up in my friend suggestions on Facebook. I thought about this and realized that our only point of contact is that I have saved their number on my phone. And these are often absolutely random people — no way I had other connections with them. There is definitely something fishy going on, and it is obviously WhatsApp scooping up my contact list. The worst part is that there is no way to configure how many contacts WhatsApp can see. You either share all your contacts or none. And it’s not the best experience if no contacts at all are shared. There is no filtering either on who can message you on WhatsApp. Anyone can message you, and there is no way to stop them. I mean, yes, you can block them <em>once they do</em> message you, but that is not the point. So many companies have realized that unlike their texts on SMS, which almost always get blocked by spam filters, their WhatsApp messages will always get to us and will always get our attention.</p>

<p>WhatsApp is free and does not serve any advertisements — at least not yet. But you see how they have started showing us these things called “channels” where people can run somewhat of a social media handle on WhatsApp. With this and many other features, WhatsApp seems to be slowly becoming another Instagram. And just as such social media platforms harvest our data to manipulate us into consuming more content, I don’t think it is hard to imagine that WhatsApp could also do that very soon.</p>

<hr />

<p><strong>Host:</strong> <strong>You’ve hinted that privacy concerns are just the surface, and that there are more frightening, psychological effects that worry you. Could you expand on that? What kind of psychological harm do you think WhatsApp is causing—or has caused—in your own experience?</strong></p>

<p><strong>Mrigank:</strong> WhatsApp exploits the social wiring in our brains and converts our insecurities into impulsive patterns. Yes, one can mute, archive, and lock chats — but at the end of the day, once I hit the send button in any chat, my brain really starts waiting for a reply. There’s also a feeling that I’m being observed, and ironically, with my own consent. My contacts can see when I am online, whether I’ve received or read their messages, and so on. On email, I can read a message, think about it, and answer whenever I like. But on WhatsApp, I am afraid that I’ll be perceived as rude or careless if I were to read a message and not immediately reply.</p>

<hr />

<p><strong>Host:</strong> <strong>Can you talk a bit about how that constant pressure affected you, specifically? Did it change how you interacted with others, or how you experienced your time and attention? Did you ever catch yourself doing things—like checking the app obsessively, or drafting quick replies just to avoid guilt—that made you pause and reflect?</strong></p>

<p><strong>Mrigank:</strong> I will actually tell you another example. It seems contrived but trust me, it is very realistic. Say somebody sends me a message, and I don’t want to reply to them immediately. I will not open the chat so that they don’t think I might be ignoring them. But imagine we are both participants in some group. I can no longer open the group because I am aware that they can see my activity there — they can know when I read a message or replied to a message. The point that I’m trying to make is that it is really easy to fall prey to these kinds of things and eventually get pressured to reply as soon as possible. And to keep checking your phone for messages, even taking a look at archived chats every now and then, et cetera. I don’t think we should have to do these mental gymnastics. At least not so often. It should be easier for us to decide when to reply to which message. Ideally, this decision should be only affected by the importance of each message.</p>

<p>WhatsApp has a lack of fine-grained control when it comes to settings related to online status, read receipts, or which users can message me. It’s an all-or-none situation. Either everyone is allowed, or all my contacts, or nobody at all. And let me mention that WhatsApp reads my entire contact list. There’s no way to restrict that access to only a few contacts. And so “my contacts” is far from the optimum configuration for these things.</p>

<p>But the worst problem probably is the volume of clutter. Suppose I want to message my mother. I open WhatsApp to reach my chat with her. But on the way to my destination, in this case, my chat with my mother, WhatsApp will throw at me so many green blurbs with countless unread messages — messages from friends, messages from work, messages from people I don’t even properly know — that it would simply be impossible to reach my destination without taking a look at all these things. Unlike email, I cannot snooze messages or categorize them into folders. It is all a continuous stream of things being thrown at me. Even if I want to check an important work-related group chat, I will end up seeing updates on other work-related groups as well — so how do I focus? Many people often find themselves juggling between multiple chats simultaneously, talking to multiple people at once. It has become so hard to focus on one conversation at a time. The fact that things are synchronous and expected to be synchronous has made our communication space so cluttered.</p>

<hr />

<p><strong>Host:</strong> <strong>You’ve raised some critical points about WhatsApp’s evolution and its impact on our daily lives. It sounds like the platform has shifted from a simple communication tool to something far more complex and, as you mentioned, potentially intrusive. How do you feel about the alternatives you’ve adopted since leaving WhatsApp? Have they met your communication needs effectively?</strong></p>

<p><strong>Mrigank:</strong> Before I stopped using WhatsApp, I put a notice on my display picture and my status, saying that, well, I have stopped using WhatsApp, and that I will be available only on call, SMS, email, and Microsoft Teams. And these are pretty much the platforms that I have been using. I use Telegram with my parents — but exclusively with my parents — because it’s easier for them to make video calls this way. The point here is not Telegram. I could very well be using something else. What is of significance here is that it is exclusive, in both its purpose and the people who can serve that purpose. Telegram, yes, has its own privacy concerns. But for now, it lets me control which contacts it can read from my phone, or who exactly can send me messages. It also contains analogs to WhatsApp’s stories and channels, but such bloat is not pushed into my face and the overall experience is almost free from clutter.</p>

<hr />

<p><strong>Host:</strong> <strong>Do you feel like WhatsApp has changed how we think about communication itself? Like, has it rewired what we expect from conversations, or how we experience being available to others?</strong></p>

<p><strong>Mrigank:</strong> Personally, I have felt that I now have much more meaningful conversations with people I really care about. I don’t need to know a lot of things — just because it is easy, I don’t need to stay updated with everything that is going on. I also do not really need to say a lot of things — simply because it is convenient, I end up saying a lot more than I really need to. If there is something really important, I use SMS, email, or Microsoft Teams. And this has been working out pretty well. On the flip side, people who really want to talk to me reach out on these alternate platforms. I am very grateful to be around people who respect my choice. My mind feels much less cluttered. I have a lot more control over myself, over my phone, and over my digital habits. And to whatever messages I do receive, I respond much better. It has been really good to set boundaries between my different <em>modes</em>. I can now choose when I want to be in my <em>work-mode</em>, when I want to be in my <em>family-mode</em>, and when I want to be in my <em>friends-mode</em>. It is much easier to maintain such <em>hygiene</em> now.</p>

<hr />

<p><strong>Host:</strong> <strong>Would you say your relationship with your phone in general has changed since quitting WhatsApp? Like, do you find yourself unlocking it less? Or has your attention shifted to other apps?</strong></p>

<p><strong>Mrigank:</strong> I haven’t really shifted to other apps. As I said, the problem is not with WhatsApp itself. Today it is WhatsApp but if tomorrow it were to shut down, everyone would move to another application and the same things would happen there. In general, I have tried to minimize my <em>social media surface</em> altogether. I haven’t been using Facebook much, and maybe not at all in the last many months. I only sparingly use Instagram and LinkedIn. I have stopped opening Twitter as well. And, yes, I find myself unlocking my phone way less. When my phone buzzes, I know that it’s an important message. When I am not working, I know that I do not have to look at email or Microsoft Teams. It is much easier to know when to focus on what.</p>

<hr />

<p><strong>Host:</strong> <strong>Does it ever feel like you’re missing out on something?</strong></p>

<p><strong>Mrigank:</strong> I feel that most of us are quite helpless in this whole situation. Many people really have commitments that make it impossible to stop using WhatsApp. Everyone just expects everyone else to be using WhatsApp. I had to convince myself that even if something seemed to go wrong when I stopped using WhatsApp, it would be a much smaller concern than all the other concerns I mentioned. Yes, I do sometimes find it hard to catch up with people. However, I feel that it’s okay — I have to make a little more effort at a personal level, for instance, by calling them on the phone or meeting them in person, but these are much healthier forms of communication than mindless texting. Yes, it is a little less convenient, but as I have been saying, convenience is the devil. This little inconvenience is worth it.</p>

<hr />

<p><strong>Host:</strong> <strong>Do you think social media, messaging apps, and the general hyperconnectivity culture are contributing to stress, anxiety, or detachment from more meaningful experiences in life?</strong></p>

<p><strong>Mrigank:</strong> As I said, I can now have much more honest and thoughtful conversations. There’s a lot more of <em>me</em> in my words. Behind screens and behind keyboards, we are just not the same people. I too have said things on WhatsApp that I doubt I would have said in person. This is the case with many of us. The same convenience that makes it easy to do nice things also makes it possible to effortlessly do things we would regret if we were to look back. And in general, there is so much pressure to respond quickly. There is hardly any time to think, ponder about what we’re going to say, or decide whether we really want to or need to say it. WhatsApp feeds our impulsive nature, and we continue to let it do so.</p>

<hr />

<p><strong>Host:</strong> <strong>Finally, what broader lesson would you share about our relationship with communication apps?</strong></p>

<p><strong>Mrigank:</strong> I feel that people need to deeply think about these issues. People need to check if the apps that they use every day for their convenience are now consuming them. People need to think about what boundaries they must draw for these apps in their personal and professional lives, and whether these apps are respecting these boundaries at all. For example, we need to decide whether we are okay with waking up in the morning, taking out our phone to greet our family, and ending up scrolling through messages from our workplace, or messages from any of the many irrelevant things on WhatsApp. We all need to decide for ourselves, whether we own ourselves, or is it our phones that own us.</p>

<hr />

<p><strong>Host:</strong> As we end this conversation, one thing is clear—this isn’t just about quitting an app. It’s about stepping back, re-evaluating our habits, and asking what kind of attention, presence, and relationships we really want. Mrigank’s reflections remind us that logging out isn’t an escape—it can be a return. To intention, to clarity, and maybe, to life as it unfolds, unfiltered.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1">
      <p>Who else do you think could agree to listen to me? <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2">
      <p>It is also more fun to write this way, but also easier to write in the first place! <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Mrigank Pawagi</name></author><category term="Digital Wellbeing" /><category term="Technology" /><summary type="html"><![CDATA[This blog is different from others in that it is in the format of a conversation. I really did have this conversation — with ChatGPT1. I have only lightly edited its responses and have left my own responses nearly verbatim. I think this is an interesting way to present my thoughts and I might try more of this in the future2. Host (ChatGPT): In a world where messages ping louder than thoughts and replies come faster than reflection, stepping away from a platform like WhatsApp isn’t just a technical decision — it’s a personal rebellion. Today, I will sit down with Mrigank who did what many only contemplate: he quit WhatsApp. Five months in, he shares the psychological weight, social friction, and digital clarity that came with that choice. Host: Mrigank, you’ve thought about leaving WhatsApp many times before, but it wasn’t until recently that you actually did it. What finally pushed you to pull the plug? Mrigank: To be fair, I have at least thought about this many times before. It is absurd how I have been able to take breaks, whenever I wanted, from other social media platforms like Instagram, Facebook, or LinkedIn — but not even once could I stop using WhatsApp since I first started using it at around 11 years old. The reason is that it’s just so difficult not to use WhatsApp! Doesn’t it sound absurd? But it is true. This time I decided not to think too much and just do it — and it’s been five months already! The reasons for this have been pretty clear to me for quite a while, and I hope we will delve into them in this conversation. But as such, there was no special reason this time. I just wanted to experiment and see for myself if it was possible to live without WhatsApp. Host: It’s interesting how certain platforms become so ingrained in our daily lives that stepping away feels almost impossible. You noted that while you’ve taken breaks from other social media platforms like Instagram, Facebook, and LinkedIn, WhatsApp was different. Could you delve deeper into what made WhatsApp uniquely challenging to leave compared to other platforms?​ Mrigank: The simple answer is that WhatsApp has become the de facto means of communication in our lives. I agree that WhatsApp has brought immense convenience to our lives. People use WhatsApp to connect with their friends and family almost effortlessly. And it is very effective at what it does. But this convenience has, in some sense, come back to bite us — and unfortunately, most people don’t even realize this. WhatsApp has started to be overused in many ways. It has become a part of our lives beyond what it needs to be. This could be school, university, or even work. In my school, for example, we would have several WhatsApp groups every year, including groups between students, parents, and teachers. Sometimes even groups for different classes, groups for extra-curricular activities, groups for student council, and whatnot. If you were not on WhatsApp, it would be impossible to keep up with what was going on. You might even miss important things because many announcements and discussions would be limited to WhatsApp alone. I have been in university for the last two and a half years, and it’s almost the same story here. There is a WhatsApp group for the entire batch, a group for B.Tech. students, a group for every student club, a common group for all clubs, a group for every upcoming event in the student clubs, a group for every hostel, a group for every mess — the list simply goes on when you think about it. Some time ago, even my university’s placement cell started a WhatsApp group with the students to send updates about the placement process and job opportunities. Is WhatsApp truly the best way to communicate all these things? We have email, Microsoft Teams (my university has a subscription), and other platforms that are meant for such communication. There is a range of features on these platforms to create groups and sub-groups of people, and they are meant for this. I will want to emphasize email here. Do we even know how to effectively use email? It is an incredibly powerful tool and so much more reliable and effective for these use cases. We continue to use WhatsApp and allow it to slowly seep into our lives beyond what is healthy. Host: What specific features, design decisions, or policies of WhatsApp did you find most problematic? Was it privacy and encryption concerns, the way groups are managed, notification overload, data‑sharing with Facebook, or something else entirely? How did those drive your decision to walk away? Mrigank: Privacy, yes, is one of the concerns. But certainly not the most important one. Much more scary to me are some of the psychological concerns — we will get to that. After all, as far as our chats themselves are concerned, WhatsApp seems to be end-to-end encrypted. But the problem with privacy is more than this. For example, I have noticed in the past that somehow people I have never seen on Facebook, never searched their names, and have no mutual friends with, magically pop up in my friend suggestions on Facebook. I thought about this and realized that our only point of contact is that I have saved their number on my phone. And these are often absolutely random people — no way I had other connections with them. There is definitely something fishy going on, and it is obviously WhatsApp scooping up my contact list. The worst part is that there is no way to configure how many contacts WhatsApp can see. You either share all your contacts or none. And it’s not the best experience if no contacts at all are shared. There is no filtering either on who can message you on WhatsApp. Anyone can message you, and there is no way to stop them. I mean, yes, you can block them once they do message you, but that is not the point. So many companies have realized that unlike their texts on SMS, which almost always get blocked by spam filters, their WhatsApp messages will always get to us and will always get our attention. WhatsApp is free and does not serve any advertisements — at least not yet. But you see how they have started showing us these things called “channels” where people can run somewhat of a social media handle on WhatsApp. With this and many other features, WhatsApp seems to be slowly becoming another Instagram. And just as such social media platforms harvest our data to manipulate us into consuming more content, I don’t think it is hard to imagine that WhatsApp could also do that very soon. Host: You’ve hinted that privacy concerns are just the surface, and that there are more frightening, psychological effects that worry you. Could you expand on that? What kind of psychological harm do you think WhatsApp is causing—or has caused—in your own experience? Mrigank: WhatsApp exploits the social wiring in our brains and converts our insecurities into impulsive patterns. Yes, one can mute, archive, and lock chats — but at the end of the day, once I hit the send button in any chat, my brain really starts waiting for a reply. There’s also a feeling that I’m being observed, and ironically, with my own consent. My contacts can see when I am online, whether I’ve received or read their messages, and so on. On email, I can read a message, think about it, and answer whenever I like. But on WhatsApp, I am afraid that I’ll be perceived as rude or careless if I were to read a message and not immediately reply. Host: Can you talk a bit about how that constant pressure affected you, specifically? Did it change how you interacted with others, or how you experienced your time and attention? Did you ever catch yourself doing things—like checking the app obsessively, or drafting quick replies just to avoid guilt—that made you pause and reflect? Mrigank: I will actually tell you another example. It seems contrived but trust me, it is very realistic. Say somebody sends me a message, and I don’t want to reply to them immediately. I will not open the chat so that they don’t think I might be ignoring them. But imagine we are both participants in some group. I can no longer open the group because I am aware that they can see my activity there — they can know when I read a message or replied to a message. The point that I’m trying to make is that it is really easy to fall prey to these kinds of things and eventually get pressured to reply as soon as possible. And to keep checking your phone for messages, even taking a look at archived chats every now and then, et cetera. I don’t think we should have to do these mental gymnastics. At least not so often. It should be easier for us to decide when to reply to which message. Ideally, this decision should be only affected by the importance of each message. WhatsApp has a lack of fine-grained control when it comes to settings related to online status, read receipts, or which users can message me. It’s an all-or-none situation. Either everyone is allowed, or all my contacts, or nobody at all. And let me mention that WhatsApp reads my entire contact list. There’s no way to restrict that access to only a few contacts. And so “my contacts” is far from the optimum configuration for these things. But the worst problem probably is the volume of clutter. Suppose I want to message my mother. I open WhatsApp to reach my chat with her. But on the way to my destination, in this case, my chat with my mother, WhatsApp will throw at me so many green blurbs with countless unread messages — messages from friends, messages from work, messages from people I don’t even properly know — that it would simply be impossible to reach my destination without taking a look at all these things. Unlike email, I cannot snooze messages or categorize them into folders. It is all a continuous stream of things being thrown at me. Even if I want to check an important work-related group chat, I will end up seeing updates on other work-related groups as well — so how do I focus? Many people often find themselves juggling between multiple chats simultaneously, talking to multiple people at once. It has become so hard to focus on one conversation at a time. The fact that things are synchronous and expected to be synchronous has made our communication space so cluttered. Host: You’ve raised some critical points about WhatsApp’s evolution and its impact on our daily lives. It sounds like the platform has shifted from a simple communication tool to something far more complex and, as you mentioned, potentially intrusive. How do you feel about the alternatives you’ve adopted since leaving WhatsApp? Have they met your communication needs effectively? Mrigank: Before I stopped using WhatsApp, I put a notice on my display picture and my status, saying that, well, I have stopped using WhatsApp, and that I will be available only on call, SMS, email, and Microsoft Teams. And these are pretty much the platforms that I have been using. I use Telegram with my parents — but exclusively with my parents — because it’s easier for them to make video calls this way. The point here is not Telegram. I could very well be using something else. What is of significance here is that it is exclusive, in both its purpose and the people who can serve that purpose. Telegram, yes, has its own privacy concerns. But for now, it lets me control which contacts it can read from my phone, or who exactly can send me messages. It also contains analogs to WhatsApp’s stories and channels, but such bloat is not pushed into my face and the overall experience is almost free from clutter. Host: Do you feel like WhatsApp has changed how we think about communication itself? Like, has it rewired what we expect from conversations, or how we experience being available to others? Mrigank: Personally, I have felt that I now have much more meaningful conversations with people I really care about. I don’t need to know a lot of things — just because it is easy, I don’t need to stay updated with everything that is going on. I also do not really need to say a lot of things — simply because it is convenient, I end up saying a lot more than I really need to. If there is something really important, I use SMS, email, or Microsoft Teams. And this has been working out pretty well. On the flip side, people who really want to talk to me reach out on these alternate platforms. I am very grateful to be around people who respect my choice. My mind feels much less cluttered. I have a lot more control over myself, over my phone, and over my digital habits. And to whatever messages I do receive, I respond much better. It has been really good to set boundaries between my different modes. I can now choose when I want to be in my work-mode, when I want to be in my family-mode, and when I want to be in my friends-mode. It is much easier to maintain such hygiene now. Host: Would you say your relationship with your phone in general has changed since quitting WhatsApp? Like, do you find yourself unlocking it less? Or has your attention shifted to other apps? Mrigank: I haven’t really shifted to other apps. As I said, the problem is not with WhatsApp itself. Today it is WhatsApp but if tomorrow it were to shut down, everyone would move to another application and the same things would happen there. In general, I have tried to minimize my social media surface altogether. I haven’t been using Facebook much, and maybe not at all in the last many months. I only sparingly use Instagram and LinkedIn. I have stopped opening Twitter as well. And, yes, I find myself unlocking my phone way less. When my phone buzzes, I know that it’s an important message. When I am not working, I know that I do not have to look at email or Microsoft Teams. It is much easier to know when to focus on what. Host: Does it ever feel like you’re missing out on something? Mrigank: I feel that most of us are quite helpless in this whole situation. Many people really have commitments that make it impossible to stop using WhatsApp. Everyone just expects everyone else to be using WhatsApp. I had to convince myself that even if something seemed to go wrong when I stopped using WhatsApp, it would be a much smaller concern than all the other concerns I mentioned. Yes, I do sometimes find it hard to catch up with people. However, I feel that it’s okay — I have to make a little more effort at a personal level, for instance, by calling them on the phone or meeting them in person, but these are much healthier forms of communication than mindless texting. Yes, it is a little less convenient, but as I have been saying, convenience is the devil. This little inconvenience is worth it. Host: Do you think social media, messaging apps, and the general hyperconnectivity culture are contributing to stress, anxiety, or detachment from more meaningful experiences in life? Mrigank: As I said, I can now have much more honest and thoughtful conversations. There’s a lot more of me in my words. Behind screens and behind keyboards, we are just not the same people. I too have said things on WhatsApp that I doubt I would have said in person. This is the case with many of us. The same convenience that makes it easy to do nice things also makes it possible to effortlessly do things we would regret if we were to look back. And in general, there is so much pressure to respond quickly. There is hardly any time to think, ponder about what we’re going to say, or decide whether we really want to or need to say it. WhatsApp feeds our impulsive nature, and we continue to let it do so. Host: Finally, what broader lesson would you share about our relationship with communication apps? Mrigank: I feel that people need to deeply think about these issues. People need to check if the apps that they use every day for their convenience are now consuming them. People need to think about what boundaries they must draw for these apps in their personal and professional lives, and whether these apps are respecting these boundaries at all. For example, we need to decide whether we are okay with waking up in the morning, taking out our phone to greet our family, and ending up scrolling through messages from our workplace, or messages from any of the many irrelevant things on WhatsApp. We all need to decide for ourselves, whether we own ourselves, or is it our phones that own us. Host: As we end this conversation, one thing is clear—this isn’t just about quitting an app. It’s about stepping back, re-evaluating our habits, and asking what kind of attention, presence, and relationships we really want. Mrigank’s reflections remind us that logging out isn’t an escape—it can be a return. To intention, to clarity, and maybe, to life as it unfolds, unfiltered. Who else do you think could agree to listen to me? &#8617; It is also more fun to write this way, but also easier to write in the first place! &#8617;]]></summary></entry></feed>