Abhinav Ramesh Kashyap | publications

2022

ACL’22

So Different Yet So Alike! Constrained Unsupervised Text Style Transfer

Abhinav, Ramesh Kashyap, Devamanyu, Hazarika, Min-Yen, Kan, Roger, Zimmermann, and Poria, Soujanya

2022

2021

NAACL’21

Domain Divergences: a Survey and Empirical Analysis

Abhinav, Ramesh Kashyap, Devamanyu, Hazarika, Min-Yen, Kan, and Roger, Zimmermann

2021

PDF
AdaptNLP@EACL

Analyzing the Domain Robustness of Pretrained Language Models, Layer by Layer

Ramesh Kashyap, Abhinav, Mehnaz, Laiba, Malik, Bhavitvya, Waheed, Abdul, Hazarika, Devamanyu, Kan, Min-Yen, and Shah, Rajiv Ratn

In Proceedings of the Second Workshop on Domain Adaptation for NLP 2021

Abs Poster

The robustness of pretrained language models(PLMs) is generally measured using performance drops on two or more domains. However, we do not yet understand the inherent robustness achieved by contributions from different layers of a PLM. We systematically analyze the robustness of these representations layer by layer from two perspectives. First, we measure the robustness of representations by using domain divergence between two domains. We find that i) Domain variance increases from the lower to the upper layers for vanilla PLMs; ii) Models continuously pretrained on domain-specific data (DAPT)(Gururangan et al., 2020) exhibit more variance than their pretrained PLM counterparts; and that iii) Distilled models (e.g., DistilBERT) also show greater domain variance. Second, we investigate the robustness of representations by analyzing the encoded syntactic and semantic information using diagnostic probes. We find that similar layers have similar amounts of linguistic information for data from an unseen domain.

2020

SDP@EMNLP

SciWING– A Software Toolkit for Scientific Document Processing

Ramesh Kashyap, Abhinav, and Kan, Min-Yen

In Proceedings of the First Workshop on Scholarly Document Processing 2020

Abs HTML PDF Code

We introduce SciWING, an open-source soft-ware toolkit which provides access to state-of-the-art pre-trained models for scientific document processing (SDP) tasks, such as citation string parsing, logical structure recovery and citation intent classification. Compared to other toolkits, SciWING follows a full neural pipeline and provides a Python inter-face for SDP. When needed, SciWING provides fine-grained control for rapid experimentation with different models by swapping and stacking different modules. Transfer learning from general and scientific documents specific pre-trained transformers (i.e., BERT, SciBERT, etc.) can be performed. SciWING incorporates ready-to-use web and terminal-based applications and demonstrations to aid adoption and development. The toolkit is available from http://sciwing.io and the demos are available at http://rebrand.ly/sciwing-demo.
DocEng

ServiceMarq: Extracting Service Contributions from Call for Papers

Tian, Shi, Kashyap, Abhinav Ramesh, and Kan, Min-Yen

In Proceedings of the ACM Symposium on Document Engineering 2020 2020

Abs PDF Code

In an era, where large numbers of academic research papers are submitted to conferences and journals, the voluntary services of academicians to manage them, is indispensable. The call for contributions of research papers – through an e-mail or as a webpage, not only solicits research works from scientists, but also lists the names of the researchers and their roles in managing the conference. Tracking such information which showcases the researchers’ leadership qualities is becoming increasingly important. Here we present ServiceMarq - a system which proactively tracks service contributions to conferences. It performs focused crawling for website-based call for papers, and integrates archival and natural language processing libraries to achieve both high precision and recall in extracting information. Our results indicate that aggregated service contribution gives an alternative but correlated picture of institutional quality compared against standard bibliometrics. In addition, we have developed a proof of concept website to track service contributions and is available at https://cfp-mining-fe.herokuapp.com and our github repo is available at https://github.com/shitian007/cfp-mining

2019

TOIT

CloseUp - A Community-Driven Live Online Search Engine

Weth, Christian Von Der, Abdul, Ashraf, Kashyap, Abhinav R, and Kankanhalli, Mohan

ACM Transactions on Internet Technology (TOIT) 2019

HTML PDF

2018

WebSci

EPICURE - Aspect-Based Multimodal Review Summarization

Ramesh Kashyap, Abhinav, Weth, Christian, Cheng, Zhiyong, and Kankanhalli, Mohan

In Proceedings of the 10th ACM Conference on Web Science 2018

Abs PDF

Restaurant reviews are popular and a valuable source of information. Often, large number of reviews are written for restaurants which warrants the need for automated summarization systems. In this paper we present epicure, a novel text and image summarization platform. For the summarization of opinionated content like reviews, considering different aspects have largely been ignored, and we address this by creating balanced reviews for different aspects like food and service. We argue that traditional criteria for extractive review summarization such as coverage and diversity have limited applicability. We draw on the power and usefulness of submodular functions for extractive summarization and introduce novel submodular functions such as importance, freshness, purity, trustworthiness and balanced opinion. We are also one of the first to provide an image summary for diffeerent aspects of a restaurant by mapping text to images using a multimodal neural network, for which we provide initial experiments. We show the effectiveness of our platform by evaluating it against strong baselines and also use crowdsourcing experiments for a subjective comparison of our approach with existing works.