Shivanshu Shekhar

PhD Student in Computer Science

About Me

I am a PhD student in Computer Science at the University of Illinois Urbana Champaign (UIUC), working with Prof. Tong Zhang. My research interests include deep learning, natural language processing, and optimization techniques for LLMs.

Shivanshu Shekhar

Education

Doctor of Philosophy in Computer Science

University of Illinois Urbana Champaign (UIUC)

Aug 2024 - Present

Advisor: Prof. Tong Zhang

Bachelor of Technology (Honours) in Electrical Engineering

Indian Institute of Technology Madras (IITM)

Nov 2020 - May 2024

CGPA: 9.22/10

Minor: Physics, Computing

Publications & Patents

SEE-DPO: Self Entropy Enhanced Direct Preference Optimization

Shivanshu Shekhar, Shreyas Singh, Tong Zhang

Submitted at TMLR 2025

Towards Optimizing the Costs of LLM Usage

Shivanshu Shekhar, Tanishq Dubey, Koyel Mukherjee, Atharv Tyagi, Apoorv Saxena, Nishanth Kotla

Under Review at NAACL 2024

SmaRt: Smart LLM Router for Optimizing Cost and Performance for Document Summarization Tasks

Shivanshu Shekhar, Tanishq Dubey, Nishantha Kotla, Koyel Mukherjee, Apoorv Saxena, Athrav Tyagi

Submitted to US Patent Office

Quality Aware Token Optimization For Reducing Costs of LLM Usage

Tanishq Dubey, Shivanshu Shekhar, Nishantha Kotla, Koyel Mukherjee, Apoorv Saxena, Athrav Tyagi

Submitted to US Patent Office

Prediction of Hydrogen Storage in Metal-Organic Frameworks: A Neural Network Based Approach

Shivanshu Shekhar and Chandra Chowdhury

Accepted in Results in Surfaces and Interfaces

Topological Data Analysis Enhanced Prediction of Hydrogen Storage in Metal-Organic Frameworks (MOFs)

Shivanshu Shekhar and Chandra Chowdhury

Submitted and in peer-review to Materials Advances

PeerFL: A Simulator for Peer-to-Peer Federated Learning at Scale

Alka Luqman, Shivanshu Shekhar, Anupam Chattopadhyay

Under Review at IEEE Internet of Things Journal

Research & Professional Experience

Adobe Research

Towards LLM Usage Cost Optimization

Summer 2023 | Bangalore, India

Guide: Dr. Koyel Mukherjee, Dr. Apoorv Saxena, Athrav Tyagi

  • Proposed a novel token-aware text paraphraser that reduced the token consumption of GPT based LLMs by 30%.
  • Developed a novel pipeline to parallelly select summarization models in a cascade to use, conditioned on the context and the budget constraints resulting in 84.50% reduction in usage cost and a 3.2% increase in the performance.
  • Compared the relation between evaluation metrics and human prediction by conducting a division-wide survey.

Nanyang Technological University

Virtual Simulator for Peer-to-Peer Federated Learning

Summer 2021-Summer 2022 | Singapore

Guide: Prof. Anupam Chattopadhyay, Alka Luqman

  • Built a fully virtual federated learning simulator capable of handling both central and peer-to-peer settings.
  • Adapted NS-3 network simulator to work with Python and interfaced TAP-devices with Docker containers and NS3 to establish an independent virtual network between the containers controlled solely by NS-3 for simulation.
  • Conducted real-world experiments to validate and confirm the accuracy and dependability of the simulator's results.

IIT Madras

Prediction of hydrogen Storage

Summer 2023 | Chennai, India

Guide: Prof. Santanu Sarkar, Prof. Chandra Chowdhury

  • Implemented novel neural architecture for prediction of hydrogen storage than achieved state of the art performance.
  • Worked on generating topologically-persistent images to be a cue to learn the optimal hydrogen storage.
  • Achieved a performance gain of 21% on error rate, on the predictions of the model compared to state of the art model.

IIT Madras

Color Restoration for Underwater Images

Fall 2022 | Chennai, India

Guide: Prof. Kaushik Mitra

  • Built a unsupervised depth estimation technique using degraded stereo image pairs for subsequent color correction.
  • Using these depth estimates, implemented another neural network for estimating the disentangled representation to estimate the underwater image formation parameters. Finally, correcting the degraded images using these parameters.

IIT Madras

Distortion Invariant Reconstruction model for document restoration

Fall 2023, Summer 2024 | Chennai, India

Guide: Prof. A.N. Rajgopalan

  • Created a dataset consisting of various types of geometric and photometric distortion for subsequent image correction.
  • Developing a novel pipeline for disentangling the distortion and clean image from the degraded image, for the subsequent generation of more synthetic data using these distortion maps and new clean images.

Key Technical Projects

Tool Creation for Large Language Models

Research Project | Spring 2025

  • Currently developing a methodology for LLMs to outline logical steps and execute complex arithmetic computations using integrated tools like NumPy, SciPy, and LEAN4.
  • Building and scaling a dedicated "tools dataset" with advanced models to showcase best practices in computational tool integration for enhanced LLM performance.

RF and Optical Communication Course Project

Spring 2023

  • Attempted to solve dynamic spectrum access for network utility maximization in multichannel wireless networks.
  • Implemented a dynamic spectrum access algorithm that uses multi-user reinforcement learning to obtain the optimal solution for the spectrum access problem with an arbitrarily large-state space constraint.

Real-time Document Localization

Computer Vision and Intelligence Club, Center for Innovation | Fall 2023

  • Reproduced the paper LDRNet: Enabling Real-time Document Localization on Mobile Devices by Han Wu et. al.
  • Employed Model Pruning and quantization to lower the model footprint by 51% while reducing accuracy by only 1.4%.

Effects of Attention Mechanism on Sequence Learning Models

Fundamental of Deep Learning Project | Spring 2023

  • Implemented seq-to-seq model for transliteration using RNN, LSTM, GRU as backbones.
  • Implemented Bahdanau Attention in decoder network and compared to it vanilla seq-to-seq models by assessing loss and accuracy measures and visualizing attention maps for qualitative validation.

Implementation of neural network from scratch

Fundamental of Deep Learning Project | Spring 2023

  • Used numpy to implement a fully vectorized neural network package used to train a classifier model on the MNIST dataset and Fashion MNIST dataset with wandb support including hyperparameter search using Bayesian optimization.
  • Implemented a cross-entropy and mean-squared error with all the major activation functions and Linear layers.

Feedback Prize - Evaluating Student Writing

Kaggle Competition | Fall 2022

  • Implemented a BERT based model for segmenting texts and classifying argumentative and rhetorical elements in essays.
  • Created a public notebook to help other people understand the problem and my approach, Awarded a Bronze Medal.

Skills

Languages

Python C++ SQL MATLAB LaTeX C JavaScript

Technologies

Git Docker MySQL Wandb AWS Flask

Libraries

PyTorch TensorFlow OpenCV Numpy Pandas Matplotlib/seaborn NS-3 Qiskit Jax Scipy CVXOPT

Scholastic Achievements

2023

Represented IIT Madras at the Inter IIT Tech Meet and won bronze medal in the computer vision event.

2020

Achieved an All India Rank 657 in JEE(Advanced) 2020 out of 150,000 shortlisted candidates.

2020

Achieved an All India Rank 1779 in JEE(Mains) 2020 out of 1.5 million candidates.

2019

Among the top 10% finalists across the Goa region of the National Graduate Physics Examination.

Leadership Experience

Strategist, Computer Vision and Intelligence Club

Center for Innovation, IIT Madras

April 2022-March 2023

  • Spearheaded a team of 40 undergraduate students to make a meaningful impact on real-world issues.
  • Engaged with leading businesses, startups, non-governmental organizations (NGOs), and professors during the tenure.
  • Organized a range of technical workshops covering fundamental concepts of Computer Vision and Artificial Intelligence.

Event Lead, Generative Model Workshop

Shaastra, Technical festival of IIT Madras

March 2023-Feb 2023

  • Mentored the team responsible for conducting workshops on computer vision related topics with over 100+ registrations.
  • Ideated and set the content for the session on AutoEncoders, GANs, Cycle GANs, DC GANs and Game theory.