My long-term research goal is to address a computational question: How can we build general problem-solving machines with human-like efficiency and adaptability? In particular, my research interests focus on the development of efficient learning algorithms for deep neural networks. My research interests overlap with the following research communities: NeruIPS, ICLR, and ICML. I am also broadly interested in reinforcement learning, natural language processing, and artificial intelligence.

For future students interested in learning algorithms and theory: Please apply through the department admission.

Short bio: I completed PhD under the supervision of Geoffrey Hinton. Both my master's (2014) and undergrad degrees (2011) are from the University of Toronto under Brendan Frey and Ruslan Salakhutdinov. I am a CIFAR AI chair. I was a recipient of the Facebook Graduate Fellowship 2016 in machine learning.

--Google scholar page contact me: jba at cs.toronto.edu

Research


Generalization of two-layer neural networks: an asymptotic viewpoint. Ba, J., Erdogdu, M., Suzuki, T., Wu, D., & Zhang, T. (2020). International conference on learning representations.

On solving minimax optimization locally: a follow-the-ridge approach. Wang, Y., Zhang, G., & Ba, J. (2020). International conference on learning representations. arXiv preprint arXiv:1910.07512.

Interplay between optimization and generalization of stochastic gradient descent with covariance noise. Wen, Y., Luk, K., Gazeau, M., Zhang, G., Chan, H., & Ba, J. (2020). International conference on artificial intelligence and statistics.

Batchensemble: an alternative approach to efficient ensemble and lifelong learning. Wen, Y., Tran, D., & Ba, J. (2020). International conference on learning representations.

An inductive bias for distances: neural nets that respect the triangle inequality. Pitis, S., Chan, H., Jamali, K., & Ba, J. (2020). International conference on learning representations.

Towards characterizing the high-dimensional bias of kernel-based particle inference algorithms. Ba, J., Erdogdu, M. A., Ghassemi, M., Suzuki, T., Sun, S., Wu, D., & Zhang, T. (2019). preprint.

Actrce: augmenting experience via teacher's advice for multi-goal reinforcement learning. Chan, H., Wu, Y., Kiros, J., Fidler, S., & Ba, J. (2019). arXiv preprint arXiv:1902.04546.

Dream to control: learning behaviors by latent imagination. Hafner, D., Lillicrap, T., Ba, J., & Norouzi, M. (2019). International conference on learning representations. arXiv preprint arXiv:1912.01603.

Dom-q-net: grounded rl on structured language. Jia, S., Kiros, J., & Ba, J. (2019). International conference on learning representations. arXiv preprint arXiv:1902.07257.

Graph normalizing flows. Liu, J. S., Kumar, A., Ba, J., Kiros, J. R., & Swersky, K. (2019). Advances in neural information processing systems.

Protoge: prototype goal encodings for multi-goal reinforcement learning. Pitis, S., Chan, H., & Ba, J. (2019). preprint.

Exploring model-based planning with policy networks. Wang, T., & Ba, J. (2019). International conference on learning representations. arXiv preprint arXiv:1906.08649.

Benchmarking model-based reinforcement learning. Wang, T., Bao, X., Clavera, I., Hoang, J., Wen, Y., Langlois, E., … Ba, J. (2019). arXiv preprint arXiv:1907.02057.

Neural graph evolution: towards efficient automatic robot design. Wang, T., Zhou, Y., Fidler, S., & Ba, J. (2019). International Conference on Learning Representations.

Lookahead optimizer: k steps forward, 1 step back. Zhang, M., Lucas, J., Hinton, G. E., & Ba, J. (2019). Advances in neural information processing systems (pp. 9593–9604).

Towards permutation-invariant graph generation. Liu, J., Kumar, A., Ba, J., & Swersky, K. (2018). preprint.

Reversible recurrent neural networks. MacKay, M., Vicol, P., Ba, J., & Grosse, R. B. (2018). Advances in neural information processing systems (pp. 9029–9040).

Kronecker-factored curvature approximations for recurrent neural networks. Martens, J., Ba, J., & Johnson, M. (2018). International conference on learning representations.

On the convergence and robustness of training gans with regularized optimal transport. Sanjabi, M., Ba, J., Razaviyayn, M., & Lee, J. D. (2018). Advances in neural information processing systems (pp. 7091–7101).

Nervenet: learning structured policy with graph neural networks. Wang, T., Liao, R., Ba, J., & Fidler, S. (2018). International conference on learning representations.

Flipout: efficient pseudo-independent weight perturbations on mini-batches. Wen, Y., Vicol, P., Ba, J., Tran, D., & Grosse, R. (2018). International Conference on Learning Representations.

Distributed second-order optimization using kronecker-factored approximations. Ba, J., Grosse, R., & Martens, J. (2017). International conference on learning representations.

Automated analysis of high‐content microscopy data with deep learning. Kraus, O. Z., Grys, B. T., Ba, J., Chong, Y., Frey, B. J., Boone, C., & Andrews, B. J. (2017). Molecular systems biology, 13(4).

Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. Wu, Y., Mansimov, E., Grosse, R. B., Liao, S., & Ba, J. (2017). Advances in neural information processing systems (pp. 5279–5288).

Using fast weights to attend to the recent past. Ba, J., Hinton, G. E., Mnih, V., Leibo, J. Z., & Ionescu, C. (2016). Advances in neural information processing systems (pp. 4331–4339).

Layer normalization. Ba, J., Kiros, J. R., & Hinton, G. E. (2016). Advances in nips 2016 deep learning symposium (p. arXiv preprint arXiv:1607.06450).

Classifying and segmenting microscopy images with deep multiple instance learning. Kraus, O. Z., Ba, J. L., & Frey, B. J. (2016). Bioinformatics, 32(12), i52-i59.

Generating images from captions with attention. Mansimov, E., Parisotto, E., Ba, J., & Salakhutdinov, R. (2016). International conference on learning representations (p. arXiv preprint arXiv:1511.02793, 2015).

Actor-mimic: deep multitask and transfer reinforcement learning. Parisotto, E., Ba, J., & Salakhutdinov, R. (2016). International conference on learning representations (p. arXiv preprint arXiv:1511.06342).

Multiple object recognition with visual attention. Ba, J., Mnih, V., & Kavukcuoglu, K. (2015). International conference on learning representations.

Learning wake-sleep recurrent attention models. Ba, J., Salakhutdinov, R. R., Grosse, R. B., & Frey, B. J. (2015). Advances in neural information processing systems (pp. 2593–2601).

Predicting deep zero-shot convolutional neural networks using textual descriptions. Ba, J., Swersky, K., & Fidler, S. (2015). Proceedings of the ieee international conference on computer vision (pp. 4247–4255).

Adam: a method for stochastic optimization. Kingma, D., & Ba, J. (2015). preprint.

Show, attend and tell: neural image caption generation with visual attention. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., … Bengio, Y. (2015). International conference on machine learning (pp. 2048–2057).

Do deep nets really need to be deep? Ba, J., & Caruana, R. (2014). Advances in neural information processing systems (pp. 2654–2662).

Making dropout invariant to transformations of activation functions and inputs. Ba, J., Xiong, H. Y., & Frey, B. (2014). NIPS 2014 Workshop on Deep Learning.

Adaptive dropout for training deep neural networks. Ba, J., & Frey, B. (2013). Advances in neural information processing systems (pp. 3084–3092).