Research
Selected publications: High-dimensional asymptotics of feature learning: how one gradient step improves the representation. Ba, J., Erdogdu, M. A., Suzuki, T., Wang, Z., Wu, D., & Yang, G. (2022).
arXiv preprint arXiv:2205.01445.
Dataset distillation using neural feature regression. Zhou, Y., Nezhadarya, E., & Ba, J. (2022).
arXiv preprint arXiv:2206.00719.
Understanding the variance collapse of svgd in high dimensions. Ba, J., Erdogdu, M. A., Ghassemi, M., Sun, S., Suzuki, T., Wu, D., & Zhang, T. (2021). International conference on learning representations.
Learning domain invariant representations in goal-conditioned block mdps. Han, B., Zheng, C., Chan, H., Paster, K., Zhang, M., & Ba, J. (2021).
Advances in Neural Information Processing Systems,
34, 764-776.
Efficient statistical tests: a neural tangent kernel approach. Jia, S., Nezhadarya, E., Wu, Y., & Ba, J. (2021). International conference on machine learning (pp. 4893–4903). PMLR.
How does a neural network's architecture impact its robustness to noisy labels? Li, J., Zhang, M., Xu, K., Dickerson, J., & Ba, J. (2021).
Advances in Neural Information Processing Systems,
34, 9788-9803.
On monotonic linear interpolation of neural network parameters. Lucas, J. R., Bae, J., Zhang, M. R., Fort, S., Zemel, R., & Grosse, R. B. (2021). International conference on machine learning (pp. 7168–7179). PMLR.
Clockwork variational autoencoders. Saxena, V., Ba, J., & Hafner, D. (2021).
Advances in Neural Information Processing Systems,
34, 29246-29257.
Lime: learning inductive bias for primitives of mathematical reasoning. Wu, Y., Rabe, M. N., Li, W., Ba, J., Grosse, R. B., & Szegedy, C. (2021). International conference on machine learning (pp. 11251–11262). PMLR.
When does preconditioning help or hurt generalization? Amari, S.-i., Ba, J., Grosse, R., Li, X., Nitanda, A., Suzuki, T., … Xu, J. (2020). International conference on learning representations.
Generalization of two-layer neural networks: an asymptotic viewpoint. Ba, J., Erdogdu, M., Suzuki, T., Wu, D., & Zhang, T. (2020). International conference on learning representations (p. https://openreview.net/forum?id=H1gBsgBY).
A study of gradient variance in deep learning. Faghri, F., Duvenaud, D., Fleet, D. J., & Ba, J. (2020).
arXiv preprint arXiv:2007.04532.
Mastering atari with discrete world models. Hafner, D., Lillicrap, T., Norouzi, M., & Ba, J. (2020). International conference on learning representations.
Action and perception as divergence minimization. Hafner, D., Ortega, P. A., Ba, J., Parr, T., Friston, K., & Heess, N. (2020).
arXiv preprint arXiv:2009.01791.
Improving transformer optimization through better initialization. Huang, X. S., Perez, F., Ba, J., & Volkovs, M. (2020). International conference on machine learning (pp. 4475–4483). PMLR.
Noisy labels can induce good representations. Li, J., Zhang, M., Xu, K., Dickerson, J. P., & Ba, J. (2020).
arXiv preprint arXiv:2012.12896.
Graph generation with energy-based models. Liu, J., Grathwohl, W., Ba, J., & Swersky, K. (2020).
ICML Workshop on Graph Representation Learning and Beyond (GRL+).
Evaluating agents without rewards. Matusch, B., Ba, J., & Hafner, D. (2020).
arXiv preprint arXiv:2012.11538.
Planning from pixels using inverse dynamics models. Paster, K., McIlraith, S. A., & Ba, J. (2020). International conference on learning representations.
An inductive bias for distances: neural nets that respect the triangle inequality. Pitis, S., Chan, H., Jamali, K., & Ba, J. (2020). International conference on learning representations (p. https://openreview.net/forum?id=HJeiDpVF).
Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. Pitis, S., Chan, H., Zhao, S., Stadie, B., & Ba, J. (2020). International conference on machine learning (pp. 7750–7761). PMLR.
Learning intrinsic rewards as a bi-level optimization problem. Stadie, B., Zhang, L., & Ba, J. (2020). Conference on uncertainty in artificial intelligence (pp. 111–120). PMLR.
On solving minimax optimization locally: a follow-the-ridge approach. Wang, Y., Zhang, G., & Ba, J. (2020). International conference on learning representations. arXiv preprint arXiv:1910.07512.
An empirical study of stochastic gradient descent with structured covariance noise. Wen, Y., Luk, K., Gazeau, M., Zhang, G., Chan, H., & Ba, J. (2020). International conference on artificial intelligence and statistics (pp. 3621–3631). PMLR.
Interplay between optimization and generalization of stochastic gradient descent with covariance noise. Wen, Y., Luk, K., Gazeau, M., Zhang, G., Chan, H., & Ba, J. (2020). International conference on artificial intelligence and statistics.
Batchensemble: an alternative approach to efficient ensemble and lifelong learning. Wen, Y., Tran, D., & Ba, J. (2020). International conference on learning representations (p. https://openreview.net/forum?id=Sklf1yrY).
Int: an inequality benchmark for evaluating generalization in theorem proving. Wu, Y., Jiang, A., Ba, J., & Grosse, R. (2020). International conference on learning representations.
Neural theorem proving on inequality problems. Wu, Y., Jiang, A., Grosse, R., & Ba, J. (2020). Artificial intelligence and theorem proving (aitp 2020).
Actrce: augmenting experience via teacher's advice for multi-goal reinforcement learning. Chan, H., Wu, Y., Kiros, J., Fidler, S., & Ba, J. (2019).
arXiv preprint arXiv:1902.04546.
Dream to control: learning behaviors by latent imagination. Hafner, D., Lillicrap, T., Ba, J., & Norouzi, M. (2019). International conference on learning representations. arXiv preprint arXiv:1912.01603.
Dom-q-net: grounded rl on structured language. Jia, S., Kiros, J., & Ba, J. (2019). International conference on learning representations. arXiv preprint arXiv:1902.07257.
Graph normalizing flows. Liu, J. S., Kumar, A., Ba, J., Kiros, J. R., & Swersky, K. (2019). Advances in neural information processing systems.
Exploring model-based planning with policy networks. Wang, T., & Ba, J. (2019). International conference on learning representations. arXiv preprint arXiv:1906.08649.
Benchmarking model-based reinforcement learning. Wang, T., Bao, X., Clavera, I., Hoang, J., Wen, Y., Langlois, E., … Ba, J. (2019).
arXiv preprint arXiv:1907.02057.
Neural graph evolution: towards efficient automatic robot design. Wang, T., Zhou, Y., Fidler, S., & Ba, J. (2019).
International Conference on Learning Representations.
An empirical study of large-batch stochastic gradient descent with structured covariance noise. Wen, Y., Luk, K., Gazeau, M., Zhang, G., Chan, H., & Ba, J. (2019).
arXiv preprint arXiv:1902.08234.
Lookahead optimizer: k steps forward, 1 step back. Zhang, M., Lucas, J., Hinton, G. E., & Ba, J. (2019). Advances in neural information processing systems (pp. 9593–9604).
Towards permutation-invariant graph generation. Liu, J., Kumar, A., Ba, J., & Swersky, K. (2018).
preprint.
Reversible recurrent neural networks. MacKay, M., Vicol, P., Ba, J., & Grosse, R. B. (2018). Advances in neural information processing systems (pp. 9029–9040).
Kronecker-factored curvature approximations for recurrent neural networks. Martens, J., Ba, J., & Johnson, M. (2018). International conference on learning representations.
On the convergence and robustness of training gans with regularized optimal transport. Sanjabi, M., Ba, J., Razaviyayn, M., & Lee, J. D. (2018). Advances in neural information processing systems (pp. 7091–7101).
Nervenet: learning structured policy with graph neural networks. Wang, T., Liao, R., Ba, J., & Fidler, S. (2018). International conference on learning representations.
Flipout: efficient pseudo-independent weight perturbations on mini-batches. Wen, Y., Vicol, P., Ba, J., Tran, D., & Grosse, R. (2018).
International Conference on Learning Representations.
Distributed second-order optimization using kronecker-factored approximations. Ba, J., Grosse, R., & Martens, J. (2017). International conference on learning representations.
Automated analysis of high‐content microscopy data with deep learning. Kraus, O. Z., Grys, B. T., Ba, J., Chong, Y., Frey, B. J., Boone, C., & Andrews, B. J. (2017).
Molecular systems biology,
13(4), 924.
Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. Wu, Y., Mansimov, E., Grosse, R. B., Liao, S., & Ba, J. (2017). Advances in neural information processing systems (pp. 5279–5288).
Using fast weights to attend to the recent past. Ba, J., Hinton, G. E., Mnih, V., Leibo, J. Z., & Ionescu, C. (2016). Advances in neural information processing systems (pp. 4331–4339).
Layer normalization. Ba, J., Kiros, J. R., & Hinton, G. E. (2016). Advances in nips 2016 deep learning symposium (p. arXiv preprint arXiv:1607.06450).
Classifying and segmenting microscopy images with deep multiple instance learning. Kraus, O. Z., Ba, J. L., & Frey, B. J. (2016).
Bioinformatics,
32(12), i52-i59.
Generating images from captions with attention. Mansimov, E., Parisotto, E., Ba, J., & Salakhutdinov, R. (2016). International conference on learning representations (p. arXiv preprint arXiv:1511.02793, 2015).
Actor-mimic: deep multitask and transfer reinforcement learning. Parisotto, E., Ba, J., & Salakhutdinov, R. (2016). International conference on learning representations (p. arXiv preprint arXiv:1511.06342).
Multiple object recognition with visual attention. Ba, J., Mnih, V., & Kavukcuoglu, K. (2015). International conference on learning representations.
Learning wake-sleep recurrent attention models. Ba, J., Salakhutdinov, R. R., Grosse, R. B., & Frey, B. J. (2015). Advances in neural information processing systems (pp. 2593–2601).
Predicting deep zero-shot convolutional neural networks using textual descriptions. Ba, J., Swersky, K., & Fidler, S. (2015). Proceedings of the ieee international conference on computer vision (pp. 4247–4255).
Adam: a method for stochastic optimization. Kingma, D., & Ba, J. (2015). International conference on learning representations.
Show, attend and tell: neural image caption generation with visual attention. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., … Bengio, Y. (2015). International conference on machine learning (pp. 2048–2057).
Do deep nets really need to be deep? Ba, J., & Caruana, R. (2014). Advances in neural information processing systems (pp. 2654–2662).
Adaptive dropout for training deep neural networks. Ba, J., & Frey, B. (2013). Advances in neural information processing systems (pp. 3084–3092).