Bio
I am currently a 5th-year Ph.D. student in the College of Information and Computer Sciences at UMass Amherst, working with
Prof. Andrew McCallum at IESL.
My research is focused on developing set-based representation learning techniques, with applications in natural language processing and information retrieval.
I earned my Master’s degree from the Indian Institute of Science (IISc) in 2018, where I was part of the Machine and Language Learning (MALL) Lab, working with Prof. Partha Talukdar.
In addition to my academic research, I have gained valuable industry experience through work/internships with leading researchers, including Dr. Tijmen Tieleman at Minds.ai, Dr. Steffen Rendle at Google Research, Dr. Achille Fokoue at IBM Research, and Dr. Tobias Schnabel at Microsoft Research.
Research Interests
• Representation Learning • Information Retrieval • Recommendation Systems • Natural Language Processing
I care about developing efficient, interpretable models that improve how we aggregate and understand information.
Education
P.h.D. in Computer Science
University of Massachusetts Amherst.
Started PhD in Fall 2019.
Thesis on Set Theoretic Representation Learning. Supervised by
Prof. Andrew McCallum. Published and presented in multiple top-tier ML/NLP conferences.
Master of Technology in System Science and Automation
Indian Institute of Science, Bangalore. M.Tech. 2018.
Thesis on Temporal Information Processing. Published fundamental research work on temporal
representation of knowledge bases.
Bachelor of Engineering in Electrical Engineering
Jadavpur University B.E. 2015.
Experience
-
Microsoft Research Summer 2024
- Research Intern, Manager - TOBIAS SCHNABEL
- Integrated LLMs in Recommendation Systems in an efficient and editable way, using a RAG-based approach with GPT-4.
- Achieved 50% improvement in human preference alignment over state-of-the-art algorithms.
-
IBM Research Summer 2023
- RESEARCH INTERN, MANAGER - ACHILLE FOKOUE
- Developed a dataset to evaluate and fine-tune LLM agents for business processes, emphasizing tool usage and multistep planning for accurate process execution.
-
Google Research Summer 2022, Fall 2022
- RESEARCH INTERN, MANAGER - STEFFEN RENDLE
- Introduced a benchmark to address compositional queries (e.g., Jazz but not Smooth Jazz) in recommendation systems.
- Designed a set-based embedding method, outperforming traditional vector-based baselines by 25% on the proposed benchmark.
-
IBM Research Spring 2022
- RESEARCH EXTERNSHIP, MENTOR - KEN CLARKSON (IBM), CAMERON MUSCO (UMASS AMHERST)
- Engineered a hashing-based fast and scalable technique for learning word embeddings with only a single pass over the data.
-
Adobe Research Summer 2021
- DOCUMENT INTELLIGENCE RESEARCH INTERN, MANAGER - DR. TONG SUN
- Proposed a dual embedding method to impose a hierarchical structure on vector-based representation using geometric embedding.
-
Minds.ai 2018 - 2019
- NEURAL NETWORK ENGINEER, MANAGER - DR. TIJMEN TIELEMAN
- Developed a Graph Convolutional Network (GCN)-based molecular property predictor to aid automated drug discovery.
- Built a deep reinforcement learning-based controller to increase battery life and fuel efficiency for hybrid vehicles.
Publications
Please visit my Google Scholar page for updated list of publications.
- SHIB SANKAR DASGUPTA, MICHAEL BORATKO, S. ATMAKURI, XIANG LORRAINE LI, D. PATEL, ANDREW MCCALLUM Word2Box: Learning Word Representation Using Box Embeddings. arXiv
- SHIB SANKAR DASGUPTA, XIANG LORRAINE LI, MICHAEL BORATKO, DONGXU ZHANG, ANDREW MCCALLUM Box-To-Box Transformations for Modeling Joint Hierarchies. 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)
- TEJAS CHHEDA, PURUJIT GOYAL, TRANG TRAN, DHRUVESH PATEL, MICHAEL BORATKO, SHIB SANKAR DASGUPTA, ANDREW MCCALLUM Box Embeddings: An open-source library for representation learning usinggeometric structures
- Shib Sankar Dasgupta*, Michael Boratko*, Dongxu Zhang, Luke Vilnis, Xiang Li, Andrew McCallum. Improving Local Identifiability in Probabilistic Box Embeddings. Neurips. 2020 (* Equal Contribution)
- Dhruvesh Patel*, Shib Sankar Dasgupta*, Michael Boratko, Xiang Li, Luke Vilnis, Andrew McCallum. Representing Joint Hierarchies with Box Embeddings. AKBC. 2020 (* Equal Contribution) [pdf] [code]
- Shib Sankar Dasgupta, Swayambhu Nath Ray and Partha Talukdar. HyTE: Hyperplane-based Temporally aware Knowledge Graph Embedding. EMNLP. 2018 [pdf] [code]
- Swayambhu Nath Ray, Shib Sankar Dasgupta and Partha Talukdar. AD3: Attentive Deep Document Dater. EMNLP. 2018 [pdf] [code]
- Shikhar Vashishth, Shib Sankar Dasgupta, Swayambhu Nath Ray and Partha Talukdar. Dating Documents using Graph Convolution Networks. EMNLP. 2018 [pdf] [code]
Skills
Coding: Python, C++, Matlab. library: PyTorch, TensorFlow, Transformers.
Course: Machine Learning, Deep Learning, Natural Language Processing, Information Retrieval.
Awards
- Scholarship. Awarded the W. Bruce Croft Graduate Scholarship in Computer Science, UMass Amherst.
- Gold Medal. Awarded the N R Khambhati Memorial Medal for best M.Tech student.
News
2020 September - Our paper on improving indentifiability on Box Embeddings got accepted at Neurips 2020.
2020 August - Awarded the W. Bruce Croft Graduate Scholarship in Computer Science, UMass Amherst.
2020 March - Paper got accepted in AKBC 2020.
2019 September - Started my PhD in UMass Amherst with Professor Andrew McCallum.
2018 Nov - Presented two papers on temporal information processing in EMNLP 2018 in Brussels, Belgium.