Papers-Stack

Keeping a track of research papers I've read.

View the Project on GitHub vdivakar/Papers-Stack

Papers-Stack

Keeping a track of research papers I’ve read.
… Guilty of not pushing the recently read on the Top!

Articles-Collection
Sheet



Keys -> || ✅ : Done reading || 📖 : In progress || 🚫 : Dropped ||

    Paper Name Notes Link Year
1 WaveNet: A Generative Model for Raw Audio notes arxiv 2016
    Causal conv. layers with dilation. Autoregression model. Sequential inference      
2 Fast Wavenet Generation Algorithm notes arxiv 2016
    WaveNet improvement. O(2 L) -> O(L). Still sequential though. 
Use queues to push & pop already computed states at each layer
     
3 Parallel WaveNet: Fast High-Fidelity Speech Synthesis   arxiv 2017
  ⬇️ Probability Density Distillation - Teacher + student based architecture. Marries efficient training of Wavenet with efficient IAF for sampling. Sampling is parallel here for realtime synthesis.
✔️medium - An Explanation of Discretized Logistic Mixture Likelihood  
✔️ vimeo - Parallel WaveNet
     
4 📕 Improved Variational Inference with Inverse Autoregressive Flow   arxiv 2016
  ⬇️ ⭐ ✔️ Introduction to Normalizing Flows (ECCV2020 Tutorial)   video  
5 Deep Unsupervised Learning UC Berkeley lectures   course  
    ✔️ L1 - Introduction -> Types: 1. Generative models 2. Self-supervised models   01:10:00  
    ✔️ L2 - Autoregressive Models -> histogram. parameterized distribution. 1.)RNN based 2.)Masking based. 2.1)MADE 2.2)Masked ConvNets   02:27:23  
    ✔️ L3 - Flow Models -> Model output != p_theta(x); instead z=f_theta(x). z comes from a prob dist. Sampling is inverse of f_inverse_theta(x). 
-> Autoregressive Flows:- Fast training; Slow sampling 
-> Inverse Autoregressive Flow:- Slow training; Fast Sampling
  01:56:53  
6 ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech   arxiv 2018
- - -——————- -– -– --
7 Deep Photo Enhancer: unpaired learning for Image Enhancement using GANS CPVR arxiv 2018
    Cycle gan extension; individual BN for x->y’ & x’->y’’; adaptive weighting for WGAN      
8 AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE Google Brain arxiv ICRL, 2021
    Vision Transformer (ViT) - sequence of img patches to Transformer. Less computation than ResNets. Training on large data trumps inductive bias in CNNs and outperforms.