About Me

I am advised by Prof. Dimitris Samaras and Prof. Roy Shilkrot.

Currently my research is 3D estimation and texture reconstruction from single images, specifically document images. I am trying to dewarp folded/warped document images to its equivalent flatbed scanned version using an end-to-end neural network [Read more >].

Prior to this one, I was working in two projects. One related to Speech disfluency detection and correction, and another was detecting Dark Patterns in webpages. These projects were advised by Prof. Roy Shilkrot [Read more >].
In my Masters', I was working in Evolutionary and Nature Inspired Algorithms, advised by Prof. Haider Banka.

Education

PhD. Computer Science, Stony Brook University, New York, USA (2016-Present)

M.Tech. Computer Science & Engineering, Indian Institute of Technology Dhanbad, India (2014-2016)

B.Tech. Computer Science & Engineering, West Bengal University of Technology, India (2009-2013)

Updates

February '19
This Spring I'm a part-time intern with Kasisto Inc.
February '19
Paper accepted in IEEE ICASSP '19, Brighton, UK.
August '18
(Co-authored) Research grant accepted by Samsung GRO.
June '17
Paper accepted in ACM Document Engineering '17, Malta.

Projects

Dewarping Document Images [Aug 2018-Currently Active]
Reconstruction of folded/warped document images in 3D using end-to-end CNN.
Python, PyTorch
Patternminator: Dark Pattern Detection in Web
Classify and warn users about cunning and deceptive UIs (DarkPatterns) in web pages.
Python, Keras/Tensorflow, JS
Increase Apparent Public Speaking Fluency by Speech Augmentation
Classify and remove disfluencies from a given speech for better speaker fluency.
Python, Tensorflow
The Common Fold: Dewarping Four-Folded Printed Documents
De-warped double-half folded papers from a single image of a regular (non range) camera.
Python, Caffe

Dewarping Document Images

Overview

  • Single warped document image as input.
  • Ouput unwarped texture.
  • Using an intermediate representation- Depth/3D coords.
  • Rendered 3D dataset of ~100k document images (synthetic), ~1k meshes (real) and appropriate ground-truths.
  • This is an ongoing research, more details will be updated soon!

Dataset

Coming Soon!

Publications

Coming Soon!

Patternminator: Dark Pattern Detection in Web

Overview

  • Take a webpage html, screenshot as input.
  • Segemnt html, and obtain corresponding segment images.
  • Extract features (html, text and image) from segments.
  • Classify -> Dark Patterns and Non-Dark Patterns (segment level).
  • Currently our system leverages from visual, textual, HTML features and achieves f1 score of 0.84 in detecting Dark Patterns contained in web elements.
  • Used SVM, Logistic Regression and XGBoost for classification.

Dataset

Coming Soon!

Publications

Coming Soon!

Increase Public Speaking Fluency by Speech Augmentation

Overview

  • Take a impromptu/unrehersed speech as input.
  • Segment the disfluencies in a sound segmentation approach.
  • Delete the disfluencies.
  • Classify silences -> Fluent (micro-pause) and Disfluent (unnatural, long-pause)
  • Current system classifies and segments filler words (uh, umm) with frame level precision of 0.95.
  • We can classify unnatural pauses and natural pauses with f1score of 0.70 given a word utterance pair.
  • Finally silences are synthesized for fluent speech.

Demo

Go here for processed examples and a demo!

Publications

  • Increase Apparent Public Speaking Fluency by Speech Augmentation
  • Sagnik Das, Nisha Gandhi, Tejas Naik, Roy Shilkrot [ IEEE ICASSP, 2019 ]

The Common Fold

Overview

  • Single half-folded document image as input.
  • Dewarped image as output.
  • We propose a segmentation-reconstruction approach.
  • Semantic segmentation to find the creases on paper using fully convolutional network (FCN)
  • Use the creases to separate parts of paper.
  • Reconstruction using a Coons-patch on each part.
  • On our de-warped image the OCR word accuracy was ~3 times more compared to the folded version.

Publications

  • The Common Fold: Utilizing the Four-Fold to Dewarp Printed Documents from a Single Image
  • Sagnik Das, Gaurav Mishra, Akshay Sudharshana, Roy Shilkrot [ ACM DocEng, 2017 ]

Publications

Journal Publications

CRHS: Clustering and Routing in Wireless Sensor Networks using Harmony Search Algorithm
Praveen Lalwani, Sagnik Das, Haider Banka, and Chiranjeev Kumar, Neural Computing and Applications 30, no. 2 (2018): 639-659.

Conference Proceedings

Increase Apparent Public Speaking Fluency By Speech Augmentation
Sagnik Das, Nisha Gandhi, Tejas Naik, and Roy Shilkrot, 44th IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), 2019. [Presentation due].

The Common Fold: Utilizing the Four-Fold to Dewarp Printed Documents from a Single Image
Sagnik Das, Gaurav Mishra, Akshay Sudharshana, and Roy Shilkrot, In Proceedings of the 2017 ACM Symposium on Document Engineering, pp. 125-128. ACM, 2017.

Bacterial Foraging Optimization Algorithm for CH Selection and Routing in Wireless Sensor Networks
Praveen Lalwani, and Sagnik Das, In Recent Advances in Information Technology (RAIT), 2016 3rd International Conference on, pp. 95-100. IEEE, 2016.

Technical Reports