About Me

I am advised by Prof. Dimitris Samaras. Prior to Fall '19, I was advised by Prof. Roy Shilkrot.
Currently my research is 3D estimation and texture reconstruction from single images, specifically document images. I am trying to dewarp folded/warped document images to its equivalent flatbed scanned version using an end-to-end neural network [Read more >].

My broad interest is creating novel assistive technologies using deep learning. Although my current research is in Computer Vision, I have significant experience in working with text and speech data. Check out my old projects on Speech disfluency correction, and Dark Patterns detection in webpages.

During my Masters', I have worked on Evolutionary and Nature Inspired Algorithms, advised by Prof. Haider Banka.

Education

PhD. Computer Science, Stony Brook University, New York, USA (2016-Present)

M.Tech. Computer Science & Engineering, Indian Institute of Technology Dhanbad, India (2014-2016)

B.Tech. Computer Science & Engineering, West Bengal University of Technology, India (2009-2013)

Updates

July '19

Paper accepted at ICCV '19, Seoul, KR.

June '19

Joined as Computer Vision intern at Tulip Interfaces

February '19

This Spring I'm a part-time intern at Kasisto Inc.

February '19

Paper accepted at ICASSP '19, Brighton, UK.

August '18

(Co-authored) Research grant accepted under Samsung GRO.

June '17

Paper accepted at ACM Document Engineering '17, Malta.

Projects

Dewarping Document Images [Aug 2018-Currently Active]

Reconstruction of folded/warped document images in 3D using end-to-end CNN.

Python, PyTorch

Patternminator: Dark Pattern Detection in Web

Classify and warn users about cunning and deceptive UIs (DarkPatterns) in web pages.

Python, Keras/Tensorflow, JS

Increase Apparent Public Speaking Fluency by Speech Augmentation

Classify and remove disﬂuencies from a given speech for better speaker ﬂuency.

Python, Tensorflow

The Common Fold: Dewarping Four-Folded Printed Documents

De-warped double-half folded papers from a single image of a regular (non range) camera.

Python, Caffe

Dewarping Document Images

Overview

Single warped document image as input.
Ouput unwarped texture.
Using an intermediate representation- Depth/3D coords.
Rendered 3D dataset of ~100k document images (synthetic), ~1k meshes (real) and appropriate ground-truths.
This is an ongoing research, more details will be updated soon!

Dataset

Coming Soon!

Publications

DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks

Sagnik Das

ICCV, 2019

Patternminator: Dark Pattern Detection in Web

Overview

Take a webpage html, screenshot as input.
Segemnt html, and obtain corresponding segment images.
Extract features (html, text and image) from segments.
Classify -> Dark Patterns and Non-Dark Patterns (segment level).
Currently our system leverages from visual, textual, HTML features and achieves f1 score of 0.84 in detecting Dark Patterns contained in web elements.
Used SVM, Logistic Regression and XGBoost for classification.

Dataset

Coming Soon!

Publications

Coming Soon!

Increase Public Speaking Fluency by Speech Augmentation

Overview

Take a impromptu/unrehersed speech as input.
Segment the disfluencies in a sound segmentation approach.
Delete the disfluencies.
Classify silences -> Fluent (micro-pause) and Disfluent (unnatural, long-pause)
Current system classifies and segments filler words (uh, umm) with frame level precision of 0.95.
We can classify unnatural pauses and natural pauses with f1score of 0.70 given a word utterance pair.
Finally silences are synthesized for fluent speech.

Publications

Increase Apparent Public Speaking Fluency by Speech Augmentation

Sagnik Das

IEEE ICASSP, 2019

The Common Fold

Overview

Single half-folded document image as input.
Dewarped image as output.
We propose a segmentation-reconstruction approach.
Semantic segmentation to find the creases on paper using fully convolutional network (FCN)
Use the creases to separate parts of paper.
Reconstruction using a Coons-patch on each part.
On our de-warped image the OCR word accuracy was ~3 times more compared to the folded version.

Publications

The Common Fold: Utilizing the Four-Fold to Dewarp Printed Documents from a Single Image

Sagnik Das

ACM DocEng, 2017

Publications

Journal Publications

CRHS: Clustering and Routing in Wireless Sensor Networks using Harmony Search Algorithm
Praveen Lalwani, Sagnik Das, Haider Banka, and Chiranjeev Kumar, Neural Computing and Applications 30, no. 2 (2018): 639-659.

Conference Proceedings

DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks
Sagnik Das^*, Ke Ma^*, Zhixin Shu, Dimitris Samaras, and Roy Shilkrot, IEEE International Conference on Computer Vision (ICCV), 2019.

Increase Apparent Public Speaking Fluency By Speech Augmentation
Sagnik Das, Nisha Gandhi, Tejas Naik, and Roy Shilkrot, In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6890-6894. IEEE, 2019.

The Common Fold: Utilizing the Four-Fold to Dewarp Printed Documents from a Single Image
Sagnik Das, Gaurav Mishra, Akshay Sudharshana, and Roy Shilkrot, In Proceedings of the 2017 ACM Symposium on Document Engineering, pp. 125-128. ACM, 2017.

Bacterial Foraging Optimization Algorithm for CH Selection and Routing in Wireless Sensor Networks
Praveen Lalwani, and Sagnik Das, In Recent Advances in Information Technology (RAIT), 2016 3rd International Conference on, pp. 95-100. IEEE, 2016.

Sagnik Das

> Research Interests: > Deep Learning > Computer Vision > Human Computer Interaction > Machine Learning

About Me

Education

Updates

Projects

Dewarping Document Images

Overview

Dataset

Publications

Patternminator: Dark Pattern Detection in Web

Overview

Dataset

Publications

Increase Public Speaking Fluency by Speech Augmentation

Overview

Publications

The Common Fold

Overview

Publications

Publications

Journal Publications

Conference Proceedings