Deep Neural Networks in Computational Neuroscience
- Tim C. Kietzmann, Tim C. KietzmannMRC Cognition and Brain Science Unit, University of Cambridge
- Patrick McClurePatrick McClureMRC Cognition and Brain Science Unit, University of Cambridge
- and Nikolaus KriegeskorteNikolaus KriegeskorteMRC Cognition and Brain Science Unit, University of Cambridge; Department of Psychology, Columbia University
The goal of computational neuroscience is to find mechanistic explanations of how the nervous system processes information to give rise to cognitive function and behavior. At the heart of the field are its models, that is, mathematical and computational descriptions of the system being studied, which map sensory stimuli to neural responses and/or neural to behavioral responses. These models range from simple to complex. Recently, deep neural networks (DNNs) have come to dominate several domains of artificial intelligence (AI). As the term “neural network” suggests, these models are inspired by biological brains. However, current DNNs neglect many details of biological neural networks. These simplifications contribute to their computational efficiency, enabling them to perform complex feats of intelligence, ranging from perceptual (e.g., visual object and auditory speech recognition) to cognitive tasks (e.g., machine translation), and on to motor control (e.g., playing computer games or controlling a robot arm). In addition to their ability to model complex intelligent behaviors, DNNs excel at predicting neural responses to novel sensory stimuli with accuracies well beyond any other currently available model type. DNNs can have millions of parameters, which are required to capture the domain knowledge needed for successful task performance. Contrary to the intuition that this renders them into impenetrable black boxes, the computational properties of the network units are the result of four directly manipulable elements: input statistics, network structure, functional objective, and learning algorithm. With full access to the activity and connectivity of all units, advanced visualization techniques, and analytic tools to map network representations to neural data, DNNs represent a powerful framework for building task-performing models and will drive substantial insights in computational neuroscience.
Explaining Brain Information Processing Requires Complex, Task-Performing Models
The goal of computational neuroscience is to find mechanistic explanations for how the nervous system processes information to support cognitive function as well as adaptive behavior. Computational models, that is, mathematical and computational descriptions of component systems, aim to capture the mapping of sensory input to neural responses and furthermore to explain representational transformations, neuronal dynamics, and the way the brain controls behavior. The overarching challenge is therefore to define models that explain neural measurements as well as complex adaptive behavior. Historically, computational neuroscientists have had successes with shallow, linear–nonlinear “tuning” models used to predict lower-level sensory processing. Yet the brain is a deep recurrent neural network that exploits multistage nonlinear transformations and complex dynamics. It therefore seems inevitable that computational neuroscience will come to rely increasingly on complex models, likely from the family of deep recurrent neural networks. The need for multiple stages of nonlinear computation has long been appreciated in the domain of vision, by both experimentalists (Hubel & Wiesel, 1959) and theorists (Fukushima, 1980; LeCun & Bengio, 1995; Riesenhuber & Poggio, 1999; Wallis & Rolls, 1997).
The traditional focus on shallow models was motivated both by the desire for simple explanations and by the difficulty of fitting complex models. Hand-crafted features, which laid the basis of modern computational neuroscience (Jones & Palmer, 1987), do not carry us beyond restricted lower-level tuning functions. As an alternative approach, researchers started directly using neural data to fit model parameters (Dumoulin & Wandell, 2008; Wu, David, & Gallant, 2006). This approach was shown to be particularly successful for early visual processes (Cadena et al., 2017; Gao & Ganguli, 2015; Maheswaranathan et al., 2018). Despite its elegance, importance, and success, this approach is ultimately limited by the number of neural observations that can be collected from a given system. Even with neural measurement technology advancing rapidly (multi-site array recordings, two-photon imaging, and neuropixels, to name just a few), the amount of recordable data may not provide enough constraints to fit realistically complex, that is, parameter-rich, models. For instance, while researchers can now record separately from hundreds of individual neurons, and the number of stimuli used may approach 10,000, the numbers of parameters in typically used deep neural networks (DNNs) are many orders of magnitude larger. For instance, the influential object recognition network “AlexNet” has 60 million parameters (Krizhevsky, Sutskever, & Hinton, 2012), and a more recent object recognition network, VGG-16, has 138 million parameters (Simonyan & Zisserman, 2015). This high number is required to encode substantial domain knowledge, which is required for intelligent behavior. Transferring this information into the model through the bottleneck of neural measurements alone is likely too inefficient for understanding and performing real-world tasks.
In search for a solution to this conundrum, the key insight was the idea that rather than fitting parameters based on neural observations, models could instead be trained to perform relevant behavior in the real world. This approach brings machine learning to bear on models for computational neuroscience, enabling researchers to constrain the model parameters via task training. In the domain of vision, for instance, category-labeled sets of training images can easily be assembled using web-based technologies, and the amount of available data can therefore be expanded more easily than for measurements of neural activity. Of course, different models trained to perform a relevant task (such as object recognition, if one tried to understand computations in the primate ventral stream) might differ in their ability to explain neural data. Testing which model architectures, input statistics, and learning objectives yield the best predictions of neural activity in novel experimental conditions (e.g., a set of images that has not been used in fitting the parameters) is thus a powerful technique to learn about the computational mechanisms that might underlie the neural responses. The combined use of task training and neural data thereby enables us to build complex models with extensive knowledge about the world in order to explain how biological brains implement cognitive function.
Brain-Inspired Neural Network Models Are Revolutionizing Artificial Intelligence and Exhibit Rich Potential for Computational Neuroscience
Neural network models have become a central class of models in machine learning (Figure 1). Driven to optimize task performance, researchers developed and improved model architectures, hardware, and training schemes that eventually led to today’s high-performance DNNs. These models have revolutionized several domains of AI (LeCun, Bengio, & Hinton, 2015). Starting with the seminal work by Krizhevsky et al. (2012), who won the ImageNet competition in visual object recognition by a large margin, deep neural networks now dominate computer vision (He, Zhang, Ren, & Sun, 2016; Simonyan & Zisserman, 2015; Szegedy et al., 2015) and drove reinforcement learning (Lange & Riedmiller, 2010; Mnih et al., 2015), speech recognition (Sak, Senior, & Beaufays, 2014), machine translation (Sutskever, Vinyals, & Le, 2014; Wu et al., 2016), and many other domains to unprecedented performance levels. In terms of visual processing, deep convolutional, feed-forward networks (CNNs) now achieve human-level classification performance (VanRullen, 2017).
Although originally inspired by biology, current DNNs implement only the most essential features of biological neural networks. They are composed of simple units that typically compute a linear combination of their inputs and pass the result through a static nonlinearity (e.g., setting negative values to zero). Similar to the ventral stream in the brain, convolutional neural networks process images through a sequence of visuotopic representations: each unit “sees” a restricted local region of the map in the previous layer (its receptive field), and similar feature detectors exist across spatial locations (although this is only approximately true in the primate brain). Along the hierarchy, CNNs and brains furthermore perform a deep cascade of nonlinear computations, resulting in receptive fields that increase in size, invariance, and complexity. Beyond these similarities, DNNs typically do not include many biological details. For instance, they often do not include lateral or top-down connections, and compute continuous outputs (real numbers that could be interpreted as firing rates) rather than spikes. The list of features of biological neural networks not captured by these models is endless.
Yet despite large differences and many biological features missing, deep convolutional neural networks predict functional signatures of primate visual processing across multiple hierarchical levels at unprecedented accuracy. Trained to recognize objects, they develop V1-like receptive fields in early layers, and are predictive of single cell recordings in macaque inferotemporal cortex (IT) (Cadieu et al., 2014; Khaligh-Razavi & Kriegeskorte, 2014; for reviews see Kriegeskorte, 2015; Yamins et al., 2014; Yamins & DiCarlo, 2016; Figure 2A). In particular, the explanatory power of DNNs is on a par with the performance of linear prediction based on an independent set of IT neurons and exceeds linear predictions based directly on the category labels on which the networks were trained (Yamins et al., 2014). DNNs explain about 50% of the variance of windowed spike counts in IT across individual images (Yamins et al., 2014), a performance level comparable to that achieved with Gabor models in V1 (Olshausen & Field, 2005). DNNs thereby constitute the only model class in computational neuroscience that is capable of predicting responses to novel images in IT with reasonable accuracy. DNN modeling has also been shown to improve predictions of intermediate representations in area V4 over alternative models (Yamins & DiCarlo, 2016). This indicates that, in order to solve the task of object classification, the trained network passes information through a similar sequence of intermediate representations as does the primate brain.
In human neuroscience too, DNNs have proven capable of predicting representations across multiple levels of processing. Whereas lower network layers better predict lower-level visual representations, subsequent, higher layers better predict activity in higher, more anterior cortical areas, as measured with functional magnetic resonance imaging (Eickenberg, Gramfort, & Thirion, 2016; Güçlü & van Gerven, 2015; Khaligh-Razavi & Kriegeskorte, 2014; Figure 2B–C). In line with results from macaque IT, DNNs were furthermore able to explain within-category neural similarities, despite being trained on a categorization task that aims at abstracting away from differences across category exemplars (Khaligh-Razavi & Kriegeskorte, 2014). At a lower spatial, but higher temporal, resolution, DNNs have also been shown to be predictive of visually evoked magnetoencephalography (MEG) data (Cichy, Khosla, Pantazis, & Oliva, 2016; Cichy, Khosla, Pantazis, Torralba, & Oliva, 2016; Seeliger et al., 2018). On the behavioral level, deep networks exhibit similar behavior to humans (Hong, Yamins, Majaj, & DiCarlo, 2016; Kheradpisheh, Ghodrati, Ganjtabesh, & Masquelier, 2016a, 2016b; Kubilius, Bracci, & Op de Beeck, 2016; Majaj, Hong, Solomon, & DiCarlo, 2015) and are currently the best-performing model in explaining human eye movements in free viewing paradigms (Kümmerer, Theis, & Bethge, 2014). Despite these advances, however, current DNNs still exhibit substantial differences in how they process and recognize visual stimuli (Linsley, Eberhardt, Sharma, Gupta, & Serre, 2017; Rajalingham et al., 2018; Ullman, Assif, Fetaya, & Harari, 2016), how they generalize to atypical category instances (Saleh, Elgammal, & Feldman, 2016), and how they perform under image manipulations, including reduced contrast and additive noise (Geirhos et al., 2017). Yet the overall success clearly illustrates the power of DNN models for computational neuroscience.
How Can Deep Neural Networks Be Tested With Brain and Behavioral Data?
DNNs are often trained to optimize external task objectives rather than being derived from neural data. However, even human-level performance does not imply that the underlying computations employ the same mechanisms (Ritter, Barrett, Santoro, & Botvinick, 2017). Testing models with neural measurements is therefore crucial to assess how well network-internal representations match cortical responses. Fortunately, computational neuroscience has a rich toolbox at its disposal that allows researchers to probe even highly complex models, including DNNs (Diedrichsen & Kriegeskorte, 2017).
One such tool is the class of encoding models, which use external, fixed feature spaces in order to model neural responses across a large variety of experimental conditions (e.g., different stimuli, Figure 2A–B). The underlying idea is that if the model and the brain compute similar features, then linear combinations of the model features should enable successful prediction of the neural responses for independent experimental data (Naselaris, Kay, Nishimoto, & Gallant, 2011). For visual representations, the model feature space can be derived from simple filters, such as Gabor wavelets (Kay, Naselaris, Prenger, & Gallant, 2008), from human labeling of the stimuli (Huth, Nishimoto, Vu, & Gallant, 2012; Mitchell et al., 2008; Naselaris, Prenger, Kay, Oliver, & Gallant, 2009), or from responses in different layers of a DNN (Agrawal, Stansbury, Malik, & Gallant, 2014; Güçlü & van Gerven, 2015).
Probing the system on the level of multivariate response patterns, representational similarity analysis (RSA) (Kriegeskorte & Kievit, 2013; Kriegeskorte, Mur, & Bandettini, 2008; Nili et al., 2014) provides another approach to comparing internal representations in DNNs and the brain (Figure 2C). RSA is based around the concept of a representational dissimilarity matrix (RDM), which stores the dissimilarities of a system’s responses (neural or model) to all pairs of experimental conditions. RDMs can therefore be interpreted as describing representational geometries: conditions that elicit similar responses are close together in response space, whereas conditions that lead to differential responses will have larger distances. A model representation is considered similar to a brain representation to the degree that it emphasizes the same distinctions among the stimuli, that is, the model and brain are considered similar if they elicit similar RDMs. Comparisons on the level of RDMs sidestep the problem of defining a correspondence mapping between the units of the model and the channels of brain-activity measurement. This approach can be applied to voxels in functional magnetic resonance imaging (fMRI) (Carlin, Calder, Kriegeskorte, Nili, & Rowe, 2011; Guntupalli, Wheeler, & Gobbini, 2016; Khaligh-Razavi & Kriegeskorte, 2014; Kietzmann, Swisher, König, & Tong, 2012), single-cell recordings (Kriegeskorte et al., 2008; Leibo, Liao, Freiwald, Anselmi, & Poggio, 2017; Tsao, Moeller, & Freiwald, 2008), magneto- and electroencephalography (M/EEG) data (Cichy, Pantazis, & Oliva, 2014; Kietzmann, Gert, Tong, & König, 2017), and behavioral measurements including perceptual judgments (Mur et al., 2013).
Although the internal features in a model and the brain may be similar, the distribution of features may not parallel the neural selectivity observed in neuroimaging data. This can either be due to methodological limitations of the neuroimaging technique, or because respective brain area exhibits a bias for certain features that is not captured in the model. To account for such deviations, mixed RSA provides a technique to recombine model features to best explain the empirical data (Khaligh-Razavi, Henriksson, Kay, & Kriegeskorte, 2017). The increase in explanatory power due to this reweighting thereby directly speaks to the question of in how far the original, non-reweighted feature space contained the correct feature distribution, relative to the brain measurements.
On the behavioral level, recognition performance (Cadieu et al., 2014; Hong et al., 2016; Majaj et al., 2015; Rajalingham et al., 2018), perceptual confusions, and illusions provide valuable clues as to how representations in brains and DNNs may differ. For instance, it can be highly informative to understand the detailed patterns of errors (Walther, Caddigan, Fei-Fei, & Beck, 2009) and reaction times across stimuli, which may reveal subtle functional differences between systems that exhibit the same overall level of task performance. Visual metamers (Freeman & Simoncelli, 2011; Wallis, Bethge, & Wichmann, 2016) provide a powerful tool to test for similarities in internal representations across systems. Given an original image, a modified version is created that nevertheless leads to an unaltered model response (for instance, the activation profile of a DNN layer). For example, if a model was insensitive to a selected band of spatial frequencies, then modifications in this particular range will remain unnoticed by the model. If the human brain processed the stimuli via the same mechanism as the model, it should similarly be insensitive to such changes. The two images are therefore indistinguishable (“metameric”) to the model and the brain. Conversely, an adversarial example is a minimal modification of an image that elicits a different category label from a DNN (Goodfellow, Shlens, & Szegedy, 2015; Nguyen, Yosinski, & Clune, 2015). For convolutional feed-forward networks, minimal changes to an image (say of a bus), which are imperceptible to humans, lead the model to classify the image incorrectly (say as an ostrich). Adversarial examples can be generated using the backpropagation algorithm down to the level of the image, to find the gradients in image space that change the classification output. This method requires omniscient access to the system, making it impossible to perform a fair comparison with biological brains, which might likewise be confused by stimuli designed to exploit the idiosyncratic aspects (Elsayed et al., 2018; Kriegeskorte, 2015). The more general lesson for computational neuroscience is that metamers and adversarial examples provide methods for designing stimuli for which different representations disagree maximally. This can optimize the power to adjudicate between alternative models experimentally.
Ranging across levels of description and modalities of brain-activity measurement, from responses in single neurons, to array recordings, fMRI and MEG data, and behavior, the methods described here enable computational neuroscientists to investigate the similarities and differences between models and neural responses. This essential element is required to be able to find an answer to the question of which biological detail and set of computational objectives is needed to align the internal representations of brains and DNNs, while exhibiting successful task performance.
Drawing Insights From Deep Neural Network Models
Deep learning has transformed machine learning and only recently found its way back into computational neuroscience. Despite their high performance in terms of predicting held-out neural data, DNNs have been met with skepticism regarding their explanatory value as models of brain information processing (e.g., Kay, 2017). One of the arguments commonly put forward is that DNNs merely exchange one impenetrably complex system for another (the “black box” argument). That is, while DNNs may be able to predict neural data, researchers now face the problem of understanding what exactly the network is doing.
The black box argument is best appreciated in historical context. Shallow models are easier to understand and supported by stronger mathematical results. For example, the weight template of a linear–nonlinear model can be directly visualized and understood in relation to the concept of an optimal linear filter. Simple models can furthermore enable researchers to understand the role of each individual parameter. A model with fewer parameters is therefore considered more parsimonious as a theoretical account. It is certainly true that simpler models should be preferred over models with excessive degrees of freedom. Many seminal explanations in neuroscience have been derived from simple models. This argument only applies, however, if the two models provide similar predictive power. Models should be as simple as possible, but no simpler. Because the brain is a complex system with billions of parameters (presumably containing the domain knowledge required for adaptive behavior) and complex dynamics (which implement perceptual inference, cognition, and motor control), computational neuroscience will eventually need complex models. The challenge for the field is therefore to find ways to draw insight from them. One way is to consider their constraints at a higher level of abstraction. The computational properties of DNNs can be understood as the result of four manipulable elements: the network architecture, the input statistics, the functional objective, and the learning algorithm.
Insights Generated at a Higher Level of Abstraction: Experiments With Network Architecture, Input Statistics, Functional Objective, and the Learning Algorithm
A worthwhile thought experiment for neuroscientists is to consider what cortical representations would develop if the world were different. Governed by different input statistics, a different distribution of category occurrences, or different temporal dependency structure, the brain and its internal representations may develop quite differently. Knowledge of how it would differ can provide us with principal insights into the objectives that it tries to solve. Deep learning allows computational neuroscientists to make this thought experiment a simulated reality (Mehrer, Kietzmann, & Kriegeskorte, 2017). Investigations of which aspects of the simulated world are crucial to render the learned representations more similar to the brain thereby serve an essential function.
In addition to changes in input statistics, the network architecture can be subject to experimentation. Current DNNs derive their power from bold simplifications. Although complex in terms of their parameter count, they are simple in terms of their component mechanisms. Starting from this abstract level, biological details can be integrated in order to see which ones prove to be required, and which ones do not, for predicting a given neural phenomenon. For instance, it can be asked whether neural responses in a given paradigm are best explained by a feed-forward or a recurrent network architecture. Biological brains draw from a rich set of dynamical primitives. It will therefore be interesting to see to what extent incorporating more biologically inspired mechanisms can enhance the power of DNNs and their ability to explain neural activity and animal behavior.
Given input statistics and architecture, the missing determinants that transform the randomly initialized model into a trained DNN are the objective function and the learning algorithm. The idea of normative approaches is that neural representations in the brain can be understood as being optimized with regard to one or many overall objectives. These define what the brain should compute, in order to provide the basis for successful behavior. While experimentally difficult to investigate, deep learning trained on different objectives allows researchers to ask the directly related inverse question: what functions need to be optimized such that the resulting internal representations best predict neural data? Various objectives have been suggested in both the neuroscience and machine learning community. Feed-forward convolutional DNNs are often trained with the objective to minimize classification error (Krizhevsky et al., 2012; Simonyan & Zisserman, 2015; Yamins & DiCarlo, 2016). This focus on classification performance has proven quite successful, leading researchers to observe an intriguing correlation: classification performance is positively related to the ability to predict neural data (Khaligh-Razavi & Kriegeskorte, 2014; Yamins et al., 2014). That is, the better the network performed on a given image set, the better it could predict neural data, even though the latter was never part of the training objective. Despite its success, the objective to minimize classification error in a DNN for visual object recognition requires millions of labeled training images. Although the finished product, the trained DNN, provides the best current predictive model of ventral stream responses, the process by which the model is obtained therefore is not biologically plausible.
To address this issue, additional objective functions from the unsupervised domain have been suggested, allowing the brain (and DNNs) to create error signals without external feedback. One influential suggestion is that neurons in the brain aim at an efficient sparse code, while faithfully representing the external information (Olshausen & Field, 1996; Simoncelli & Olshausen, 2001). Similarly, compression-based objectives aim to represent the input with as few neural dimensions as possible. Autoencoders are one model class following this coding principle (Hinton & Salakhutdinov, 2006). Exploiting information from the temporal domain, the temporal stability or slowness objective is based on the insight that latent variables that vary slowly over time are useful for adaptive behavior. Neurons should therefore detect the underlying, slowly changing signals, while disregarding fast changes likely due to noise. This potentially simplifies readout from downstream neurons (Berkes & Wiskott, 2005; Földiák, 1991; Kayser, Körding, & König, 2003; Kayser, Einhäuser, Dümmer, König, & Körding, 2001; Körding, Kayser, Einhäuser, & König, 2004; Rolls, 2012; Wiskott & Sejnowski, 2002). Stability can be optimized across layers in hierarchical systems, if each subsequent layer tries to find an optimally stable solution from the activation profiles in previous layer. This approach was shown to lead to invariant codes for object identity (Franzius, Wilbert, & Wiskott, 2008) and viewpoint-invariant place selectivity (Franzius, Sprekeler, & Wiskott, 2007; Wyss, König, & Verschure, 2006). Experimental evidence in favor of the temporal stability objective in the brain has been provided by electrophysiological and behavioral studies (Li & DiCarlo, 2008, 2010; Wallis & Bülthoff, 2001).
Many implementations of classification, sparseness, and stability objectives ignore the action repertoire of the agent. Yet different cognitive systems living in the same world may exhibit different neural representations because the requirements to optimally support action may differ. Deep networks optimizing the predictability of the sensory consequence (Weiller, Märtin, Dähne, Engel, & König, 2010) or the cost of a given action (Mnih et al., 2015) have started incorporating the corresponding information. More generally, it should be noted that there are likely multiple objectives that the brain optimizes across space and time (Marblestone, Wayne, & Kording, 2016), and neural response patterns may encode multiple types of information simultaneously, enabling selective read-out by downstream units (DiCarlo & Cox, 2007).
In summary, one way to draw theoretical insights from DNN models is to explore what architectures, input statistics, objective functions, and learning algorithms yield the best predictions for neural activity and behavior. This approach does not elucidate the role of individual units or connections in the brain. However, it can reveal which features of biological structure likely support selected functional aspects, and what objectives the biological system might be optimized for, either via evolutionary pressure or during the development of the individual.
Looking Into the Black Box: Receptive Field Visualization and “In Silico” Electrophysiology
In addition to contextualizing DNNs on a more abstract level, we can also open the “black box” and look inside. Unlike a biological brain, a DNN model is entirely accessible to scrutiny and manipulation, enabling, for example, high-throughput “in silico” electrophysiology. The latter can be used to gain an intuition for the selectivity of individual units. For instance, large and diverse image sets can be searched for the stimuli that lead to maximal unit activation (Figure 3). Building on this approach, the technique of network dissection has emerged, which provides a more quantitative view on unit selectivity (Zhou, Bau, Oliva, & Torralba, 2017). It uses a large data set of segmented and labeled stimuli to first find images and image regions that maximally drive network units. Based on the ground-truth labels for these images, it is then derived whether the unit’s selectivity is semantically consistent across samples. If so, an interpretable label, ranging from color selectivity to different textures, object parts, objects, and whole scenes, is assigned to the unit. This characterization can be applied to all units of a network layer, providing powerful summary statistics.
Another method for understanding a unit’s preferences is via feature visualization, a rapidly expanding set of diverse techniques that directly speak to the desire for human interpretability beyond example images. One of many ways to visualize what image features drive a given unit deep in a neural network is to approximately undo the operations performed by a convolutional DNN in the context of a given image (Zeiler & Fergus, 2014). This results in visualizations such as those shown in Figure 3A. A related technique is feature visualization by optimization (see Olah, Mordvintsev, & Schubert (2017) for a review), which is based on the idea to use backpropagation (Rumelhart, Hinton, & Williams, 1986), potentially including a natural image prior, to calculate the change in the input needed to drive or inhibit the activation of any unit in a DNN (Simonyan & Zisserman, 2015; Yosinski, Clune, Nguyen, Fuchs, & Lipson, 2015). As one option, the optimization can be started from an image that already strongly drives the unit, computing a gradient in image space that enhances the unit’s activity even further. The gradient-adjusted image shows how small changes to the pixels affect the activity of the unit. For example, if the image that is strongly driving the unit shows a person next to a car, the corresponding gradient image might reveal that it is really the face of the person driving the unit’s response. In that case, the gradient image would deviate from zero only in the region of the face, and adding it to the original image would accentuate the facial features. Relatedly, optimization can be started from an arbitrary image, with the goal of enhancing the activity of a single or all units in a given layer (as iteratively performed in Google’s DeepDream). Another option is to start from pure noise images, and to again use backpropagation to iteratively optimize the input to strongly drive a particular unit. This approach yields complex psychedelic-looking patterns containing features and forms, which the network has learned through its task training (Figure 3B). Similar to the previous approach that characterizes a unit by finding maximally driving stimuli, gradient images are best derived from many different test images to get a sense of the orientation of its tuning surface around multiple reference points (test images). Relatedly, it is important to note that the tuning function of a unit deep in a network cannot be characterized by a single visual template. If it could, there would be no need for multiple stages of nonlinear transformation. However, the techniques described in this section can provide first intuitions about unit selectivities across different layers or time points.
DNNs can provide computational neuroscientists with a powerful tool, and are far from a black box. Insights can be generated by looking at the parameters of DNN models at a more abstract level, for instance, by observing the effects on predictive performance resulting from changes to the network architecture, input statistics, objective function, and learning algorithm. Furthermore, in silico electrophysiology enables researchers to measure and manipulate every single neuron, in order to visualize and characterize its selectivity and role in the overall system.
What Neurobiological Details Matter to Brain Computation?
A second concern about DNNs is that they abstract away too much from biological reality to be of use as models for neuroscience. Whereas the black box argument states that DNNs are too complex, the biological realism argument states that they are too simple. Both arguments have merit. It is conceivable that a model is simultaneously too simple (in some ways) and too complex (in other ways). However, this raises a fundamental question: which features of the biological structure should be modeled and which omitted to explain brain function (Tank, 1989)?
Abstraction is the essence of modeling and is the driving force of understanding. If the goal of computational neuroscience is to understand brain computation, then we should seek the simplest models that can explain task performance and predict neural data. The elements of the model should map onto the brain at some level of description. However, what biological elements must be modeled is an empirical question. DNNs are important not because they capture many biological features, but because they provide a minimal functioning starting point for exploring what biological details matter to brain computation. If, for instance, spiking models outperformed rate-coding models at explaining neural activity and task performance (for example, in tasks requiring probabilistic inference [Buesing, Bill, Nessler, & Maass, 2011]), then this would be strong evidence in favor of spiking models. Large-scale models will furthermore enable an exploration of the level of detail required in systems implementing the whole perception–action cycle (Eliasmith et al., 2012; Eliasmith & Trujillo, 2014).
Convolutional DNNs such as AlexNet (Krizhevsky et al., 2012) and VGG (Simonyan & Zisserman, 2015) were built to optimize performance rather than biological plausibility. However, these models draw from a history of neuroscientific insight and share many qualitative features with the primate ventral stream. The defining property of convolutional DNNs is the use of convolutional layers. These have two main characteristics: (1) local connections that define receptive fields and (2) parameter sharing between neurons across the visual field. Whereas spatially restricted receptive fields are a prevalent biological phenomenon, parameter sharing is biologically implausible. However, biological visual systems learn qualitatively similar sets of basis features in different parts of a retinotopic map, and similar results have been observed in models optimizing a sparseness objective (Güçlü & van Gerven, 2014; Olshausen & Field, 1996). Moving toward greater biological plausibility with DNNs, locally connected layers that have receptive fields without parameter sharing have been suggested (Uetz & Behnke, 2009). Researchers have already started exploring this type of DNN, which was shown to be very successful in face recognition (Sun, Wang, & Tang, 2015; Taigman, Ranzato, Aviv, & Park, 2014). One reason for this is that locally connected layers work best in cases where similar features are frequently present in the same visual arrangement, such as faces. In the brain, retinotopic organization principles have been proposed for higher-level visual areas (Levy, Hasson, Avidan, Hendler, & Malach, 2001), and similar organization mechanisms may have led to faciotopy, the spatially stereotypical activation for facial features across the cortical surface in face-selective regions (Henriksson, Mur, & Kriegeskorte, 2015).
Beyond the Feed-Forward Sweep: Recurrent DNNs
Another aspect in which convolutional AlexNet and VGG deviate from biology is the focus on feed-forward processing. Feed-forward DNNs compute static functions and are therefore limited to modeling the feed-forward sweep of signal flow through a biological system. Yet recurrent connections are a key computational feature in the brain, and represent a major research frontier in neuroscience. In the visual system, too, recurrence is a ubiquitous phenomenon. Recurrence is likely the source of representational transitions from global to local information (Matsumoto, Okada, Sugase-Miyamoto, Yamane, & Kawano, 2005; Sugase, Yamane, Ueno, & Kawano, 1999). The timing of signatures of facial identity (Barragan-Jason, Besson, Ceccaldi, & Barbeau, 2013; Freiwald & Tsao, 2010) and social cues, such as direct eye contact (Kietzmann et al., 2017), too, point towards a reliance on recurrent computations. Finally, recurrent connections likely play a vital role in early category learning (Kietzmann, Ehinger, Porada, Engel, & König, 2016), in dealing with occlusion (Wyatte, Curran, & O’Reilly, 2012; Wyatte, Jilk, & O’Reilly, 2014), and in object-based attention (Roelfsema, Lamme, & Spekreijse, 1998).
Whereas the first generation of DNNs focused on feed-forward, the general class of DNNs can implement recurrence (Oord, Kalchbrenner, & Kavukcuoglu, 2016). By using lateral recurrent connections, DNNs can implement visual attention mechanisms (Li, Yang, Liu, Wen, & Xu, 2017; Mnih, Heess, Graves, & Kavukcuoglu, 2014), and lateral recurrent connections can also be added to convolutional DNNs (Liang & Hu, 2015; Spoerer, McClure, & Kriegeskorte, 2017). These increase the effective receptive field size of each unit, and allow for long-range activity propagation (Pavel et al., 2017). Lateral connections can make decisive contributions to network computation. For instance, in modeling the responses of retinal ganglion cells, the introduction of lateral recurrent connections to feed-forward CNNs leads to the emergence of contrast adaptation in the model (McIntosh, Maheswaranathan, Nayebi, Ganguli, & Baccus, 2017). In addition to local feed-forward and lateral recurrent connections, the brain also uses local feedback, as well as long-range feed-forward and feedback connections. While missing from the convolutional DNNs previously used to predict neural data, DNNs with these different connection types have been implemented (He et al., 2016; Liao & Poggio, 2016; Spoerer et al., 2017; Srivastava, Greff, & Schmidhuber, 2015). Moreover, long short-term memory (LSTM) units (Hochreiter & Schmidhuber, 1997) are a popular form of recurrent connectivity used in DNNs. These units use differentiable read and write gates to learn how to use and store information in an artificial memory “cell.” Recently, a biologically plausible implementation of LSTM units has been proposed using cortical microcircuits (Costa, Assael, Shillingford, de Freitas, & Vogels, 2017).
The field of recurrent convolutional DNNs is still in its infancy, and the effects of lateral and top-down connections on the representational dynamics in these networks, as well as their predictive power for neural data, are yet to be fully explored. Recurrent architectures are an exciting tool for computational neuroscience and will likely allow for key insights into the recurrent computational dynamics of the brain, from sensory processing to flexible cognitive tasks (Song, Yang, & Wang, 2016, 2017).
Optimizing for External Objectives: Backpropagation and Biological Plausibility
Apart from architectural considerations, backpropagation, the most successful learning algorithm for DNNs, has classically been considered neurobiologically implausible. Rather than as a model of biological learning, backpropagation may be viewed as an efficient way to arrive at reasonable parameter estimates, which are then subject to further tests. That is, even if backpropagation is considered a mere technical solution, the trained model may still be a good model of the neural system. However, if the brain does optimize cost functions during development and learning (which can be diverse, and supervised, unsupervised, or reinforcement-based), then it will have to use a form of optimization mechanism, an instance of which are stochastic gradient descent techniques. There is growing literature on neurobiologically plausible forms of error-driven learning, that is, ways in which the brain could adjust its internal parameters to optimize such objective functions (Lee, Zhang, Fischer, & Bengio, 2015; Lillicrap et al., 2016; O’Reilly, 1996; Xie & Seung, 2003). These methods have been shown to allow deep neural networks to learn simple vision tasks (Guerguiev, Lillicrap, & Richards, 2017). The brain may not be performing the exact algorithm of backpropagation, but it may have a mechanism for modifying synaptic weights in order to optimize one or many objective functions (Marblestone et al., 2016).
Stochasticity, Oscillations, and Spikes
Another aspect in which DNNs deviate from biological realism is that DNNs are typically deterministic, while biological networks are stochastic. While much of this stochasticity is commonly thought to be noise, it has been hypothesized that this variability could code for uncertainty (Fiser, Berkes, Orbán, & Lengyel, 2010; Hoyer, Hyvarinen, Patrik, Aapo, & Hyv, 2003; Orban, Berkes, Fiser, & Lengyel, 2016). In line with this, DNNs that include stochastic sampling during training and test can yield higher performance, and are better able to estimate their own uncertainty (McClure & Kriegeskorte, 2016). Furthermore, currently available recurrent convolutional DNNs often only run for a few time steps, and the roles of dynamical features found in biological networks, such as oscillations, are only beginning to be tested (Finger & König, 2013; Reichert & Serre, 2013; Siegel, Donner, & Engel, 2012). Another abstraction is the omission of spiking dynamics. However, DNNs with spiking neurons can be implemented (Hunsberger & Eliasmith, 2016; Tavanaei & Maida, 2016) and represent an exciting frontier of deep learning research. These considerations show that it would be hasty to judge the merits of DNNs based on the level of abstraction chosen in the first generation.
Deep Learning: A Powerful Framework to Advance Computational Neuroscience
Deep neural networks have revolutionized machine learning and AI, and have recently found their way back into computational neuroscience. DNNs reach human-level performance in certain tasks, and early experiments indicate that they are capable of capturing characteristics of cortical function that cannot be captured with shallow linear–nonlinear models. With this, DNNs offer an intriguing new framework that enables computational neuroscientists to address fundamental questions about brain computation in the developing and adult brain.
Computational neuroscience comprises a wide range of models, defined at various levels of biological and behavioral detail (Figure 4). For instance, many conductance-based models contain large numbers of parameters to explain a single or a few neurons at a great level of detail but are typically not geared towards behavior. DNNs, at the other end of the spectrum, use their high number of parameters not to account for effects on the molecular level, but to achieve behavioral relevance, while accounting for overall neural selectivity. Explanatory merit is not only gained by biological realism (because this would render human brains the perfect explanation for themselves), nor does it directly follow from simplistic models that cannot account for complex animal behavior. The space of models is continuous, and neuroscientific insight works across multiple levels of explanation, following top-down and bottom-up approaches (Craver, 2009). The usage of DNNs in computational neuroscience is still in its infancy, and the integration of biological detail will require close collaboration between modelers, experimental neuroscientists, and anatomists.
DNNs will not replace shallow models, but rather enhance researchers’ investigative repertoire. With computers approaching the brain in computational power, we are entering a truly exciting phase of computational neuroscience.
Kriegeskorte (2015)—introduction of deep learning as a general framework to understand brain information processing.
Yamins & DiCarlo (2016)—perspective on goal-driven deep learning to understand sensory processing.
Marblestone, Wayne, & Kording (2016)—review with a focus on cost functions in the brain and DNNs.
Lindsay (2018)—overview of how DNNs can be used as models of visual processing.
LeCun, Bengio, & Hinton (2015)—high-level overview of deep learning.
Goodfellow, Bengio, & Courville (2016)—introductory book on deep learning.
- Agrawal, P., Stansbury, D., Malik, J., & Gallant, J. (2014). Pixels to voxels: Modeling visual representation in the human brain. ArXiv Preprint, 1–15.
- Barragan-Jason, G., Besson, G., Ceccaldi, M., & Barbeau, E. J. (2013). Fast and famous: Looking for the fastest speed at which a face can be recognized. Frontiers in Psychology, 4(March), 100.
- Berkes, P., & Wiskott, L. (2005). Slow feature analysis yields a rich repertoire of complex cell properties. Journal of Vision, 5, 579–602.
- Buesing, L., Bill, J., Nessler, B., & Maass, W. (2011). Neural dynamics as sampling: A model for stochastic computation in recurrent networks of spiking neurons. PLoS Computational Biology, 7(11).
- Cadena, S. A., Denfield, G. H., Walker, E. Y., Gatys, L. A., Tolias, A. S., Bethge, M., & Ecker, A. S. (2017). Deep convolutional models improve predictions of macaque V1 responses to natural images. BioRxiv, 1–16.
- Cadieu, C. F., Hong, H., Yamins, D. L. K., Pinto, N., Ardila, D., Solomon, E. A., . . . DiCarlo, J. J. (2014). Deep neural networks rival the representation of primate IT cortex for core visual object recognition. PLoS Computational Biology, 10(12), 1–18.
- Carlin, J. D., Calder, A. J., Kriegeskorte, N., Nili, H., & Rowe, J. B. (2011). A head view-invariant representation of gaze direction in anterior superior temporal sulcus. Current Biology, 21(21), 1–5.
- Cichy, R. M., Khosla, A., Pantazis, D., & Oliva, A. (2016). Dynamics of scene representations in the human brain revealed by magnetoencephalography and deep neural networks. NeuroImage, 153, 1–13.
- Cichy, R. M., Khosla, A., Pantazis, D., Torralba, A., & Oliva, A. (2016). Deep neural networks predict hierarchical spatio-temporal cortical dynamics of human visual object recognition. ArXiv Preprint, arXiv:1601.02970, 1–15.
- Cichy, R. M., Pantazis, D., & Oliva, A. (2014). Resolving human object recognition in space and time. Nature Neuroscience, 17, 455–462.
- Costa, R. P., Assael, Y. M., Shillingford, B., de Freitas, N., & Vogels, T. P. (2017). Cortical microcircuits as gated-recurrent neural networks. Advances in Neural Information Processing Systems, 30, 272–283.
- Craver, C. (2009). Explaining the brain: Mechanisms and the mosaic unity of neuroscience 2007. New York: Oxford University Press.
- DiCarlo, J. J., & Cox, D. D. (2007). Untangling invariant object recognition. Trends in Cognitive Sciences, 11(8), 333–341.
- Diedrichsen, J., & Kriegeskorte, N. (2017). Representational models: A common framework for understanding encoding, pattern-component, and representational-similarity analysis. PLoS Computational Biology, 13(4), 1–33.
- Dumoulin, S. O., & Wandell, B. A. (2008). Population receptive field estimates in human visual cortex. NeuroImage, 39(2), 647–660.
- Eickenberg, M., Gramfort, A., & Thirion, B. (2016). Seeing it all: Convolutional network layers map the function of the human visual system. NeuroImage, 152, 184–194.
- Eliasmith, C., Stewart, T. C., Choo, X., Bekolay, T., DeWolf, T., Tang, C., & Rasmussen, D. (2012). A large-scale model of the functioning brain. Science, 338(6111), 1202–1205.
- Eliasmith, C., & Trujillo, O. (2014). The use and abuse of large-scale brain models. Current Opinion in Neurobiology, 25, 1–6.
- Elsayed, G. F., Shankar, S., Cheung, B., Papernot, N., Kurakin, A., Goodfellow, I., & Sohl-Dickstein, J. (2018). Adversarial examples that fool both human and computer vision. ArXiv Preprint, 1–19.
- Finger, H., & König, P. (2013). Phase synchrony facilitates binding and segmentation of natural images in a coupled neural oscillator network. Frontiers in Computational Neuroscience, 7(January), 195.
- Fiser, J., Berkes, P., Orbán, G., & Lengyel, M. (2010). Statistically optimal perception and learning: from behavior to neural representations. Trends in Cognitive Sciences, 14(3), 119–130.
- Földiák, P. (1991). Learning invariance from transformation sequences. Neural Computation, 3, 194–200.
- Franzius, M., Sprekeler, H., & Wiskott, L. (2007). Slowness and sparseness lead to place, head-direction, and spatial-view cells. PLoS Computational Biology, 3, 1605–1622.
- Franzius, M., Wilbert, N., & Wiskott, L. (2008). Invariant object recognition with slow feature analysis. In Artificial Neural Networks–ICANN 2008 (pp. 961–970). Berlin and Heidelberg: Springer.
- Freeman, J., & Simoncelli, E. P. (2011). Metamers of the ventral stream. Nature Neuroscience, 14(9), 1195–1201.
- Freiwald, W. A., & Tsao, D. Y. (2010). Functional compartmentalization and viewpoint generalization within the macaque face-processing system. Science, 330(6005), 845–851.
- Seeliger, K., Fritsche, M., Güçlü, U., Schoenmakers, S., Schoffelen, J.-M., Bosch, S. E., & van Gerven, M. A. J. (2018). Convolutional neural network-based encoding and decoding of visual object recognition in space and time. NeuroImage, 180(A), 253–266.
- Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 46, 193–202.
- Gao, P., & Ganguli, S. (2015). On simplicity and complexity in the brave new world of large-scale neuroscience. Current Opinion in Neurobiology, 15, 148–155.
- Geirhos, R., Janssen, D. H. J., Schütt, H. H., Rauber, J., Bethge, M., & Wichmann, F. A. (2017). Comparing deep neural networks against humans: Object recognition when the signal gets weaker. ArXiv Preprint, 1–31.
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Cambridge, MA: MIT Press
- Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. ArXiv Preprint, arXiv:1607.02533, 1–11.
- Güçlü, U., & van Gerven, M. A. J. (2014). Unsupervised feature learning improves prediction of human brain activity in response to natural images. PLoS Computational Biology, 10(8).
- Güçlü, U., & van Gerven, M. A. J. (2015). Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. Journal of Neuroscience, 35(27), 10005–10014.
- Guerguiev, J., Lillicrap, T. P., & Richards, B. A. (2017). Towards deep learning with segregated dendrites. ELife, 6, 1–37.
- Guntupalli, J., Wheeler, K., & Gobbini, M. (2016). Disentangling the representation of identity from head view. Cerebral Cortex, 27(1), 1–25.
- He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Computer Vision and Pattern Recognition (CVPR) (pp. 770–778). New York, NY: IEEE Publishing
- Henriksson, L., Mur, M., & Kriegeskorte, N. (2015). Faciotopy—A face-feature map with face-like topology in the human occipital face area. Cortex, 72, 156–167.
- Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.
- Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
- Hong, H., Yamins, D. L., Majaj, N. J., & DiCarlo, J. J. (2016). Explicit information for category-orthogonal object properties increases along the ventral stream. Nature Neuroscience, 19(4), 613–622.
- Hoyer, P. O. P., Hyvarinen, A., Patrik, O. H., Aapo, H., & Hyv, A. (2003). Interpreting neural response variability as Monte Carlo sampling of the posterior. Advances in Neural Information Processing Systems, 13, 293–300.
- Hubel, D., & Wiesel, T. (1959). Receptive fields of single neurones in the cat’s striate cortex. Journal of Physiology, 148, 574–591.
- Hunsberger, E., & Eliasmith, C. (2016). Training spiking deep networks for neuromorphic hardware. ArXiv Preprint, arXiv:1611.05141, 1–10.
- Huth, A. G., Nishimoto, S., Vu, A. T., & Gallant, J. L. (2012). A continuous semantic space describes the representation of thousands of object and action categories across the human brain. Neuron, 76(6), 1210–1224.
- Jones, J. P., & Palmer, L. A. (1987). An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. Journal of Neurophysiology, 58(6), 1233–1258.
- Kay, K. N. (2017). Principles for models of neural information processing. NeuroImage, 180(A), 101–109.
- Kay, K. N., Naselaris, T., Prenger, R. J., & Gallant, J. L. (2008). Identifying natural images from human brain activity. Nature, 452(7185), 352–355.
- Kayser, C., Einhäuser, W., Dümmer, O., König, P., & Körding, K. (2001). Extracting slow subspaces from natural videos leads to complex cells. Artificial Neural Networks—ICANN, 1075–1080.
- Kayser, C., Körding, K. P., & König, P. (2003). Learning the nonlinearity of neurons from natural visual stimuli. Neural Computation, 15(8), 1751–1759.
- Khaligh-Razavi, S.-M., Henriksson, L., Kay, K., & Kriegeskorte, N. (2017). Fixed versus mixed RSA : Explaining visual representations by fixed and mixed feature sets from shallow and deep computational models. Journal of Mathematical Psychology, 76, 184–197.
- Khaligh-Razavi, S.-M., & Kriegeskorte, N. (2014). Deep supervised, but not unsupervised, models may explain IT cortical representation. PLoS Computational Biology, 10(11), 1–29.
- Kheradpisheh, S. R., Ghodrati, M., Ganjtabesh, M., & Masquelier, T. (2016a). Deep networks resemble human feed-forward vision in invariant object recognition. Scientific Reports, 6(32672), 1–24.
- Kheradpisheh, S. R., Ghodrati, M., Ganjtabesh, M., & Masquelier, T. (2016b). Humans and deep networks largely agree on which kinds of variation make object recognition harder. Frontiers in Computational Neuroscience, 10(August), 1–15.
- Kietzmann, T. C., Ehinger, B. V., Porada, D., Engel, A. K., & König, P. (2016). Extensive training leads to temporal and spatial shifts of cortical activity underlying visual category selectivity. NeuroImage, 134, 22–34.
- Kietzmann, T. C., Gert, A., Tong, F., & König, P. (2017). Representational dynamics of facial viewpoint encoding. Journal of Cognitive Neuroscience, 4, 637–651.
- Kietzmann, T. C., Swisher, J. D., König, P., & Tong, F. (2012). Prevalence of selectivity for mirror-symmetric views of faces in the ventral and dorsal visual pathways. Journal of Neuroscience, 32(34), 11763–11772.
- Körding, K. P., Kayser, C., Einhäuser, W., & König, P. (2004). How are complex cell properties adapted to the statistics of natural stimuli? Journal of Neurophysiology, 91(1), 206–212.
- Kriegeskorte, N. (2015). Deep neural networks: a new framework for modelling biological vision and brain information processing. Annual Reviews of Vision Science, 1, 417–446.
- Kriegeskorte, N., & Kievit, R. A. (2013). Representational geometry: Integrating cognition, computation, and the brain. Trends in Cognitive Sciences, 17(8), 401–412.
- Kriegeskorte, N., Mur, M., & Bandettini, P. (2008). Representational similarity analysis—connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience, 2(November), 4.
- Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1–9.
- Kubilius, J., Bracci, S., & Op de Beeck, H. P. (2016). Deep neural networks as a computational model for human shape sensitivity. PLoS Computational Biology, 12(4), e1004896.
- Kümmerer, M., Theis, L., & Bethge, M. (2014). Deep Gaze I: Boosting saliency prediction with feature maps trained on ImageNet. ArXiv Preprint, 1–11.
- Lange, S., & Riedmiller, M. (2010). Deep auto-encoder neural networks in reinforcement learning. International Joint Conference on Neural Networks (IJCNN), 1–8, IEEE, New York
- LeCun, Y., & Bengio, Y. (1995). Convolutional networks for images, speech, and time-series. In The handbook of brain theory and neural networks (pp. 255–258).
- LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
- Lee, D. H., Zhang, S., Fischer, A., & Bengio, Y. (2015). Difference target propagation. Joint European conference on machine learning and knowledge discovery in databases (pp. 498–515). New York: Springer.
- Leibo, J. Z., Liao, Q., Freiwald, W., Anselmi, F., & Poggio, T. (2017). View-tolerant face recognition and Hebbian learning imply mirror-symmetric neural tuning to head orientation. Current Biology, 27, 62–67.
- Levy, I., Hasson, U., Avidan, G., Hendler, T., & Malach, R. (2001). Center–periphery organization of human object areas. Nature Neuroscience, 4(5), 533–539.
- Li, N., & DiCarlo, J. J. (2008). Unsupervised natural experience rapidly alters invariant object representation in visual cortex. Science, 321(5895), 1502–1507.
- Li, N., & DiCarlo, J. J. (2010). Unsupervised natural visual experience rapidly reshapes size-invariant object representation in inferior temporal cortex. Neuron, 67(6), 1062–1075.
- Li, Z., Yang, Y., Liu, X., Wen, S., & Xu, W. (2017). Dynamic computational time for visual attention. In ICCV (pp. 1–11). New York, NY: IEEE Publishing.
- Liang, M., & Hu, X. (2015). Recurrent convolutional neural network for object recognition. In Computer Vision and Pattern Recognition (CVPR) (pp. 3367–3375). New York, NY: IEEE Publishing.
- Liao, Q., & Poggio, T. (2016). Bridging the gaps between residual learning, recurrent neural networks and visual cortex. ArXiv Preprint, 1–16.
- Lillicrap, T. P., Cownden, D., Tweed, D. B., Akerman, C. J., Bell, C., Bodznick, D., . . . Bengio, Y. (2016). Random synaptic feedback weights support error backpropagation for deep learning. Nature Communications, 7, 1–10.
- Lindsay, G. (2018). Deep convolutional neural networks as models of the visual system: Q&A. Neurdiness—Thinking about Brains.
- Linsley, D., Eberhardt, S., Sharma, T., Gupta, P., & Serre, T. (2017). What are the visual features underlying human versus machine vision? International Conference on Computer Vision, (ICCV) (pp. 1–9). New York: IEEE Publishing.
- McClure, P., & Kriegeskorte, N. (2016). Robustly representing uncertainty in deep neural networks through sampling. ArXiv Preprint, v7, 1–14.
- McIntosh, L. T., Maheswaranathan, N., Nayebi, A., Ganguli, S., & Baccus, S. A. (2017). Deep learning models of the retinal response to natural scenes. Advances in Neural Information Processing Systems, 30, 1–9.
- Maheswaranathan, N., Mcintosh, L., Kastner, D. B., Melander, J., Brezovec, L., Nayebi, A., . . . Baccus, S. A. (2018). Deep learning models reveal internal structure and diverse computations in the retina under natural scenes. BioRxiv.
- Majaj, N. J., Hong, H., Solomon, E. A., & DiCarlo, J. J. (2015). Simple learned weighted sums of inferior temporal neuronal firing rates accurately predict human core object recognition performance. Journal of Neuroscience, 35(39), 13402–13418.
- Marblestone, A. H., Wayne, G., & Kording, K. P. (2016). Towards an integration of deep learning and neuroscience. Frontiers in Computational Neuroscience, 10, 1–41.
- Matsumoto, N., Okada, M., Sugase-Miyamoto, Y., Yamane, S., & Kawano, K. (2005). Population dynamics of face-responsive neurons in the inferior temporal cortex. Cerebral Cortex, 15(8), 1103–1112.
- Mehrer, J., Kietzmann, T. C., & Kriegeskorte, N. (2017). Deep neural networks trained on ecologically relevant categories better explain human IT. In Cognitive Computational Neuroscience Meeting (Vol. 1, pp. 1–2).
- Mitchell, T. M., Shinkareva, S. V, Carlson, A., Chang, K.-M., Malave, V. L., Mason, R. A., & Just, M. A. (2008). Predicting human brain activity associated with the meanings of nouns. Science, 320(5880), 1191–1195.
- Mnih, V., Heess, N., Graves, A., & Kavukcuoglu, K. (2014). Recurrent models of visual attention. Advances in Neural Information Processing Systems, 27, 1–9.
- Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A, Veness, J., Bellemare, M. G., . . . Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533.
- Mur, M., Meys, M., Bodurka, J., Goebel, R., Bandettini, P. A., & Kriegeskorte, N. (2013). Human object-similarity judgments reflect and transcend the primate-IT object representation. Frontiers in Psychology, 4(March), 1–22.
- Naselaris, T., Kay, K. N., Nishimoto, S., & Gallant, J. L. (2011). Encoding and decoding in fMRI. NeuroImage, 56(2), 400–410.
- Naselaris, T., Prenger, R. J., Kay, K. N., Oliver, M., & Gallant, J. L. (2009). Bayesian reconstruction of natural images from human brain activity. Neuron, 63, 902–915.
- Nguyen, A., Yosinski, J., & Clune, J. (2015). Deep neural networks are easily fooled. Computer Vision and Pattern Recognition (pp. 427–436). New York, NY: IEEE Publishing.
- Nili, H., Wingfield, C., Walther, A., Su, L., Marslen-Wilson, W., & Kriegeskorte, N. (2014). A toolbox for representational similarity analysis. PLoS Computational Biology, 10(4).
- Olah, C., Mordvintsev, A., & Schubert, L. (2017). Feature Visualization. Distill.
- Olshausen, B., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(13), 607–609.
- Olshausen, B., & Field, D. J. (2005). How close are we to understanding v1? Neural Computation, 17(8), 1665–1699.
- Oord, A. van den, Kalchbrenner, N., & Kavukcuoglu, K. (2016). Pixel recurrent neural networks. Arxiv Preprint, 1–11.
- Orban, G., Berkes, P., Fiser, J., & Lengyel, M. (2016). Neural variability and sampling-based probabilistic representations in the visual cortex. Neuron, 92(2), 530–543.
- O’Reilly, R. C. (1996). Biologically plausible error-driven learning using local activation differences: The generalized recirculation algorithm. Neural Computation, 8(5), 895–938.
- Pavel, M. S., Schulz, H., Behnke, S., Serban Pavel, M., Schulz, H., & Behnke, S. (2017). Object class segmentation of RGB-D video using recurrent convolutional neural networks. Neural Networks, 88, 105–113.
- Rajalingham, R., Issa, E. B., Bashivan, P., Kar, K., Schmidt, K., & DiCarlo, J. J. (2018). Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks. BioRxiv, 1–41.
- Reichert, D. P., & Serre, T. (2013). Neuronal synchrony in complex-valued deep networks. International Conference on Learning Representations.
- Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11), 1019–1025.
- Ritter, S., Barrett, D. G. T., Santoro, A., & Botvinick, M. M. (2017). Cognitive psychology for deep neural networks: A shape bias case study. ArXiv Preprint, arXiv:1706.08606.
- Roelfsema, P. R., Lamme, V. A., & Spekreijse, H. (1998). Object-based attention in the primary visual cortex of the macaque monkey. Nature, 395(6700), 376–381.
- Rolls, E. T. (2012). Invariant visual object and face recognition: Neural and computational bases, and a model, VisNet. Frontiers in Computational Neuroscience, 6(June), 35.
- Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536
- Sak, H., Senior, A., & Beaufays, F. (2014). Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. ArXiv Preprint, arXiv:1402.1128, 1–5.
- Saleh, B., Elgammal, A., & Feldman, J. (2016). The role of typicality in object classification: Improving The generalization capacity of convolutional neural networks. ArXiv Preprint, 1–8.
- Siegel, M., Donner, T., & Engel, A. (2012). Spectral fingerprints of large-scale neuronal interactions. Nature Reviews Neuroscience, 13(February), 20–25.
- Simoncelli, E. P., & Olshausen, B. A. (2001). Natural image statistics and neural representation. Annual Review of Neuroscience, 24, 1193–1216.
- Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. ArXiv Preprint, arXiv:1409.15506, 1–14.
- Song, H. F., Yang, G. R., & Wang, X. J. (2016). Training excitatory-inhibitory recurrent neural networks for cognitive tasks: A simple and flexible framework. PLoS Computational Biology, 12(2), 1–30.
- Song, H. F., Yang, G. R., & Wang, X. J. (2017). Reward-based training of recurrent neural networks for cognitive and value-based tasks. ELife, 6, 1–24.
- Spoerer, C. J., McClure, P., & Kriegeskorte, N. (2017). Recurrent convolutional neural networks: a better model of biological object recognition under occlusion. Frontiers in Psychology, 8(1551), 1–14.
- Srivastava, R. K., Greff, K., & Schmidhuber, J. (2015). Highway networks. ArXiv Preprint, 1–6.
- Sugase, Y., Yamane, S., Ueno, S., & Kawano, K. (1999). Global and fine information coded by single neurons in the temporal visual cortex. Nature, 400(6747), 869–873.
- Sun, Y., Wang, X., & Tang, X. (2015). Deeply learned face representations are sparse, selective, and robust. In Computer Vision and Pattern Recognition (CVPR) (pp. 2892–2900). New York: IEEE Publishing.
- Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, 27, 1–9.
- Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., . . . Rabinovich, A. (2015). Going deeper with convolutions. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 7–12 June, 1–9.
- Taigman, Y., Ranzato, M. A., Aviv, T., & Park, M. (2014). DeepFace: Closing the gap to human-level performance in face verification. In Computer Vision and Pattern Recognition (CVPR) (pp. 1–8). New York: IEEE Publishing.
- Tank, D. (1989). What details of neural circuits matter? Seminars in the Neurosciences, 1, 67–79.
- Tavanaei, A., & Maida, A. S. (2016). Bio-inspired spiking convolutional neural network using layer-wise sparse coding and STDP learning. ArXiv Preprint, arXiv:1611.03000v2, 1–20.
- Tsao, D. Y., Moeller, S., & Freiwald, W. A. (2008). Comparing face patch systems in macaques and humans. Proceedings of the National Academy of Sciences, 105(49), 19514.
- Uetz, R., & Behnke, S. (2009). Locally-connected hierarchical neural networks for gpu-accelerated object recognition. Advances in Neural Information Processing Systems - Workshop on Large-Scale Machine Learning: Parallelism and Massive Datasets, 22, 10–13.
- Ullman, S., Assif, L., Fetaya, E., & Harari, D. (2016). Atoms of recognition in human and computer vision. Proceedings of the National Academy of Sciences, 113(10), 2744–2749.
- VanRullen, R. (2017). Perception science in the age of deep neural networks. Frontiers in Psychology, 8(February), 142.
- Wallis, G., & Bülthoff, H. H. (2001). Effects of temporal association on recognition memory. Proceedings of the National Academy of Sciences of the United States of America, 98(8), 4800–4804.
- Wallis, G., & Rolls, E. (1997). Invariant face and object recognition in the visual system. Progress in Neurobiology, 51, 167–194.
- Wallis, T. S. A., Bethge, M., & Wichmann, F. A. (2016). Testing models of peripheral encoding using metamerism in an oddity paradigm. Journal of Vision, 16(2), 1–30.
- Walther, D. B., Caddigan, E., Fei-Fei, L., & Beck, D. M. (2009). Natural scene categories revealed in distributed patterns of activity in the human brain. Journal of Neuroscience, 29(34), 10573–10581.
- Weiller, D., Märtin, R., Dähne, S., Engel, A. K., & König, P. (2010). Involving motor capabilities in the formation of sensory space representations. PloS One, 5(4), e10377.
- Wiskott, L., & Sejnowski, T. J. T. J. (2002). Slow feature analysis: Unsupervised learning of invariances. Neural Computation, 14(4), 715–770.
- Wu, M. C.-K., David, S. V., & Gallant, J. L. (2006). Complete functional characterization of sensory neurons by system identification. Annual Review of Neuroscience, 29(1), 477–505.
- Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., . . . Dean, J. (2016). Google’s Neural Machine Translation system: Bridging the gap between human and machine translation. ArXiv Preprint, 1–23.
- Wyatte, D., Curran, T., & O’Reilly, R. (2012). The limits of feedforward vision: Recurrent processing promotes robust object recognition when objects are degraded. Journal of Cognitive Neuroscience, 24(11), 2248–2261.
- Wyatte, D., Jilk, D. J., & O’Reilly, R. C. (2014). Early recurrent feedback facilitates visual object recognition under challenging conditions. Frontiers in Psychology, 5(July).
- Wyss, R., König, P., & Verschure, P. F. M. J. (2006). A model of the ventral visual system based on temporal stability and local memory. PLoS Biology, 4(5), 836–843.
- Xie, X., & Seung, H. S. (2003). Equivalence of backpropagation and contrastive Hebbian learning in a layered network. Neural Computation, 15(2), 441–454.
- Yamins, D. L., & DiCarlo, J. J. (2016). Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience, 19(3), 356–365.
- Yamins, D. L., Hong, H., Cadieu, C., Solomon, E., Seibert, D., & DiCarlo, J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences of the United States of America, 111(23), 8619–8624.
- Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., & Lipson, H. (2015). Understanding neural networks through deep visualization. International Conference on Machine Learning—Deep Learning Workshop 2015.
- Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In European Conference on Computer Vision (pp. 818–833). Cham: Springer.
- Zhou, B., Bau, D., Oliva, A., & Torralba, A. (2017). Interpreting deep visual representations via network dissection. In Computer Vision and Pattern Recognition (CVPR) (pp. 1–9). New York, NY: IEEE Publishing.