Nithish Kannen

Currently, I am a researcher (pre-doc) at Google Research where I work on Multimodal AI (Text + Vision models). Previously, I was with Amazon Alexa AI in London working on Text-based Content Recommendation using LLMs for Alexa. I recently graduated with an Integrated Masters (BS + MS) in EECS ( Electrical and Computer Science Engineering ) from IIT Kharagpur . I did my Masters Thesis with Prof. Pawan Goyal at CNeRG Lab which is now published at EMNLP 2023 (Findings). My Masters Thesis was awarded the Best Project Award amongst 2023 graduating students. At IIT KGP, I explored Question Answering, Contrastive Learning, Meta Learning and Aspect Based Sentiment Analysis.

In summer 2022, I was fortunate to have spent 3 wonderful months in Berlin with Amazon Research . I was part of the Golden Eagle witihin Alexa AI - Natural Language Understanding supervised by Markus Boese and Caglar Tirkaz (now at Apple AI) and proposed a Controllable Data Augmenter using Prompt Learning for NLU. Here's a talk I gave on Recent Counter-Intuitive Findings on Prompt Learning in the GE-Science Reading Group at Amazon. Before that, I interned at IBM Research AI with Shajith Ikbal, Hima Karanam and LV Subramaniam where I worked on Temporal Reasoning for Complex Question Question Answering . My work merited the Outstanding Intern Award from the Director of IBM Research India and is published at EMNLP 2023 (Main). Unrelated: here is my take on research.

I am currently working on Multimodal AI (Text + Vision models). For more details about my interests, please refer to the Research section. I am always on the lookout for research opportunities and collaborations and welcome comments/crtiques on any of my past works.

Email  /  CV  /  LinkedIn  /  GitHub  /  Google Scholar

News

  • [Jan ' 24]  Gave a talk at NLU reading group on Vision-Language Models: Evaluation and Cultural Lens at Google Research India . Slides here.
  • [Dec ' 23]  Serving as a reviewer for NAACL 2024
  • [Dec ' 23]  My Masters Thesis was awarded the Best Project Award of the graduating batch in 2023!
  • [Dec ' 23]  I will be in Singapore for EMNLP 2023 from 4th to 12th Dec. Come say hi!
  • [Nov ' 23]  I moved to Google Research whre I would be focussing on Multi-modal AI!
  • [Oct ' 23]  Super stoked to have 2 papers accepted at EMNLP 2023!! Details Soon. See you in Singapore.
  • [Aug ' 23]  Gave a talk on NLP Conspiracy Theories in IXT-Science Reading group at Alexa AI.
  • [Aug ' 23]  Served as a reviewer for EMNLP 2023
  • [Jun ' 23]  Joined Amazon Alexa AI in London/Cambridge, UK
  • [Apr ' 23]  Stoked to receive admission into Language Technology Institute (LTI) at Carnegie Mellon University (CMU) for graduate studies !!
  • [Apr ' 23]  Checkout my Master's Thesis that got nominated for Best Thesis award. Find the presentation here
  • [Oct ' 22]  TOEFL evaluators believe I am decent (115/120) at the English language.
  • [Aug '22]  Gave a talk at the NLU-GE-Science (Golden Eagle) reading group at Amazon on recent counter-intuitive findings in Prompting Literature. Check out the slides
  • [Jun '22]  Excited to be spending a summer at Amazon Research with the Alexa AI-NLU IFS team in Berlin!
  • [May '22]  Received Student Voulunteer Award at ACL 2022 .
  • [Apr '22]  Super stoked to have received an offer as an Applied Scientist Intern from Amazon Science Berlin with the Alexa NLU team
  • [Mar '22]  Checkout our new preprint on Targetted Fact Extraction for Temporal KBQA
  • [Jan '22]  Selected to attend Research Summer School at Google Research India - one among 30 picked for the NLP track
  • [Dec '21]  One paper accepted at AAAI 2022 Conference (SDU Workshop)
  • [Dec '21]  Serving as a sub-reviewer for SDU Workshop at AAAI Conference.
  • [Nov '21]  Manuscript submitted to AAAI 2022 (SDU Worksop)
  • [Nov '21]  Team finished top3 globally in Shared Task at Scientific Document Understanding at AAAI 2022
  • [Nov '21]  Manurscript submitted to ACL 2022 Main Conference
  • [Sept '21]  Work done in summer 2021 received Outstanding Intern Award by Director of IBM Research, India
  • [Sept '21]  Joined as a RA at CNeRG Labs IITKGP
  • [May '21]  Started Internship at IBM Research AI, Bangalore .

Research

I currently work at Google Research in the Languages Group, where I look into problems in the multimodal AI space (Text-to-Image). I am fascinated by several sub-problems in this space and here are a few representative questions I like to think about:
1) How do you quantify diversity in LLM/T2I generations? Is diversity a double-edged sword?
2) How do "concepts" in the linguistic embedding space relate to the vision space?
3) Can we use semiotics and visual aesthetics as additional signals for training multi-cultural VLMs? Here is a reference.
I am also interested in the fairness and safety aspects of GenAI models and would love to chat with folks working in this space (Email: nitkan@google.com). I contribute to the Bindi Project at Google. Previously, I have worked with Amazon Alexa AI at Germany, Amazon Info AI in the UK, and IBM Research AI in India. I have experience in Temporal Reasoning, QA, Conversation and Dialogue, Aspect-based Sentiment Analysis, Multilingual NLP and Recommender Systems. Do reach out if you are interested in any of my works and would like to chat about it. I have been fortunate to have received great guidance and mentorship from my collaborators thus far. I am happy to guide younger folks new to the space.

Publications
TempQA Best of Both Worlds: Towards Improving Temporal Knowledge Base Question Answering via Targeted Fact Extraction
Nithish Kannen , Udit Sharma, Sumit Neelam, Dinesh Khandelwal, Shajith Ikbal, Hima Karanam, L Venkata Subramaniam
EMNLP 2023
paper |

Temporal question answering (QA) is a special category of complex question answering task that requires reasoning over facts asserting time intervals of events. Previous works have predominately relied on Knowledge Base Question Answering (KBQA) for temporal QA. One of the major challenges faced by these systems is their inability to retrieve all relevant facts due to factors such as incomplete KB and entity/relation linking errors (Patidar et al., 2022). A failure to fetch even a single fact will block KBQA from computing the answer. Such cases of KB incompleteness are even more profound in the temporal context. To address this issue, we explore an interesting direction where a targeted temporal fact extraction technique is used to assist KBQA whenever it fails to retrieve temporal facts from the KB. We model the extraction problem as an open-domain question answering task using off-the-shelf language models. This way, we target to extract from textual resources those facts that failed to get retrieved from the KB. Experimental results on two temporal QA benchmarks show promising 30% & 10% relative improvements in answer accuracies without any additional training cost.

BookLSTM CONTRASTE: Supervised Contrastive Pre-training With Aspect-based Prompts For Aspect Sentiment Triplet Extraction
Rajdeep Mukherjee, Nithish Kannen , Saurabh Kumar Pandey, Pawan Goyal
Findings of EMNLP 2023 .
paper | code

Existing works on Aspect Sentiment Triplet Extraction (ASTE) explicitly focus on developing more efficient fine-tuning approaches for the task. Different from these, we propose CONTRASTE, a novel pre-training strategy using CONTRastive learning to improve the performance of the downstream ASTE task. Given a sentence and its associated (aspect, opinion, sentiment) triplets, first, we design aspect-based prompts with corresponding sentiments masked. We then (pre)train an encoder-decoder architecture by applying contrastive learning on the decoder-generated aspect-aware sentiment representations of the masked terms. For fine-tuning the pre-trained model weights thus obtained, we then propose a novel multi-task approach where the base encoder-decoder framework is combined with two complementary modules, a tagging-based Opinion Term Detector, and a regression-based Triplet Count Estimator. Exhaustive experiments on four benchmark datasets and a detailed ablation study establish the importance of each of our proposed components as we achieve new state-of-the-art results for the ASTE task. We further demonstrate that our proposed pre-training scheme can improve the performance of other ABSA tasks such as Aspect Category Opinion Sentiment (ACOS) quad prediction, Target Aspect Sentiment Detection (TASD), and Aspect Extraction and Sentiment Classification (AESC).

CABACE Architecture CABACE: Injecting Character Sequence Information and Domain Knowledge for Enhanced Acronym Extraction
Nithish Kannen , Divyanshu Sheth, Abhranil Chandra, Shubraneel Pal
Scientific Document Understanding (SDU) at Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI) 2022
paper | code

Acronyms and long-forms are a common sight in research documents in general, more so in documents from scientific and legal domains. Many of the acronyms used in these documents are domain-specific, and are very rarely found in normal text corpora. Owing to this, transformer-based NLP models suffer while detecting OOV (Out of Vocabulary) for acronym tokens and their performance suffers while linking acronyms to their long forms during extraction. Moreover, transformer-based models like BERT are not specialized to handle scientific and legal documents. With these points being the overarching motivation behind this work, we propose a novel framework CABACE: Character-Aware BERT for ACronym Extraction, that takes into account character sequences in text, and is adapted to the scientific and legal domains by masked language modelling. We further use an objective function with augmented loss function, adding mask + max loss for training CABACE. Experimental results prove the superiority of the proposed framework in comparison to various baselines. Additionally, we show that the proposed framework is better suited than baseline models for zero-shot generalization to non-English languages, thus reinforcing the effectiveness of our approach. Our team BacKGProp secured the highest scores on the French dataset, second highest on Danish and Vietnamese, and third highest in the Legal English dataset on the global Codalab leaderboard for the acronym extraction shared task at SDU AAAI-22.

BookLSTM Smart Factories of Industry 4.0: Determination of the Effective Smartphone Position for Human Activity Recognition using Deep Learning
Nithish Kannen , Abdulhamit Subasi
Chapter in Advanced Signal Processing for Industry 4.0, Volume 2

In this chapter, we present a user-dependent and independent deep learning-based approach for transportation mode recognition using smartphone sensor data. Moreover, comparative experiments over six deep learning models including the Convolutional and Recurrent Neural Networks are conducted to determine the best position of the smartphone for transportation mode recognition.

TempQA Targeted Extraction of Temporal Facts from Textual Resources for Improved Temporal Question Answering over Knowledge Bases
Nithish Kannen , Udit Sharma, Sumit Neelam, Dinesh Khandelwal, Shajith Ikbal, Hima Karanam, L Venkata Subramaniam
Preprint
paper |

Knowledge Base Question Answering (KBQA) systems have the goal of answering complex natural language questions by reasoning over relevant facts retrieved from Knowledge Bases (KB). One of the major challenges faced by these systems is their inability to retrieve all relevant facts due to factors such as incomplete KB and entity/relation linking errors. In this paper, we address this particular challenge for systems handling a specific category of questions called temporal questions, where answer derivation involve reasoning over facts asserting point/intervals of time for various events. We propose a novel approach where a targeted temporal fact extraction technique is used to assist KBQA whenever it fails to retrieve temporal facts from the KB. We use λ-expressions of the questions to logically represent the component facts and the reasoning steps needed to derive the answer. This allows us to spot those facts that failed to get retrieved from the KB and generate textual queries to extract them from the textual resources in an open-domain question answering fashion. We evaluated our approach on a benchmark temporal question answering dataset considering Wikidata and Wikipedia respectively as the KB and textual resource. Experimental results show a significant ∼30\% relative improvement in answer accuracy, demonstrating the effectiveness of our approach.

BookLSTM Multi-Task Learning of a Controllable Generator for Intent Classification and Slot Filling

Under review ARR

Selected Experiences
AmzSci Applied Scientist Intern - Amazon Alexa AI, Berlin
Supervisor(s): Markus Boese and Caglar Tirkaz (now at Apple AI)

Was part of the Golden Eagle Science team with a focus on Alexa AI - Natural Language Understanding. Developed a multi-task generative framework using Prompt Learning as a controllable utterance generator for intent classification and slot labelling. Manuscript detailing our work is under preparation for ACL 2023.

irl Research Intern - IBM Research AI, Bangalore [ Outstanding Intern Award ]
Supervisor(s): Shajith Ikbal, Hima Karanam and LV Subramaniam

Worked with the Neuro-Symbolic AI team on targetted temporal fact extraction from textual resource to aid Complex Question Answering over Knoweledge Base via Temporal Reasoning. Work under review in ACL Rolling Review.

cnerg Research Assistant - CNERG Lab, IIT KGP
Supervisor(s): Prof. Pawan Goyal

1 ) Working on a Meta Learning framework for Cross Lingual Question Generation and Question Answering. 2) Working on a tagging free approach for Aspect Sentiment Triplet Extraction using syntactic features and Graph Neural Networks (GATs).

UoTurku Research Assistant - University of Turku, Finland (code)
Supervisor(s): Prof. Abdulhamit Subasi

Worked on classification of heart diseases using Signal Processing and Deep Learning techniques based on Stethoscope audio data.

Selected Projects
Acronym and Long-Form Extraction from Scientific and Legal Documents of 6 Languages (code)
Acronym Extraction Shared Task at AAAI-22 (SDU)

Proposed a novel unified character-aware framework for Acronym and Long-Form extraction using domain specific language modelling and optimized objective function. Our team secured the highest score on French, and 2nd highest score on Vietnamese, Danish and Legal English on the global Codalab leaderboard.

MetaQG: Improved Cross-Lingual Question Generation in Low-Resource Languages using Enhanced Meta-Learning Framework (report)
Bachelor Thesis (Mid) |Advisor: Prof. Pawan Goyal

Proposed a novel framework for Cross Lingual question generation inspired from Meta Learning and Adversarial Training. Experimented with data augmentation techniques and robustness of QA and QG systems to context shuffling. Work planned for submission to TALIP Journal.

Multi-Agent Path Planning using Graph Theory (code) | (report)
Artificial Intelligence Foundations and Applications Term Project |Advisors: Prof. Partha Pratim Chakrabarti and Prof. Arijit Mondal

Proposed a novel Multi‑Agent Path Finding Algorithm to perform a set of pickup‑delivery tasks in a pre‑defined warehouse map using Multi‑Label A*. Performed agent‑task pair scheduling using IDA* algorithm and implemented Floyd Warshall for computing heuristics on the implicit graph

Positions Of Responsibility & Extracurricular Activities
NSS Senior Editor (Apr' 20 - Jun' 21)
IIT Tech Ambit (My Profile)

Official tech magazine of the IITs, developed at IIT Kharagpur that identifies research carried out by the stakeholders of IITs and their impact. Authored numerous articles for monthly magazines. Interviewed stakeholders and achievers within KGP and outside
kdag Core Member (Sept '20 - Jun '21)
Kharagpur Data Analytics Group (KDAG), IIT Kharagpur

Organized research paper-reading sessions for students of IIT Kharagpur. Conducted Data Science and ML workshop for more than 600 registered students. The KDAG is a group of students enthusiastic about Data Science and Machine Learning, along with its applications.
Thanks to Jon Barron for sharing this awesome template!