AAAI Major Conference

Essay Instructions:

Students taking the course for 4 credit hours need to complete two short literature review papers, about 8 pages each. See the schedule page for the deadlines.

Details of the paper

For each of your papers, you should choose three papers from major conferences that address the same or closely-related problems from somewhat different perspectives or using somewhat different techniques. Do not include a paper that is a direct follow-on to, or response to, another paper in your list. Avoid pairs of papers that seem to come from the same research group or people who were recently in the same research group (e.g. a researcher and their thesis advisor). It's more interesting to write about parallel contrasting approaches.

Your paper should briefly summarize the three papers, analyzing how their methods and results compare and contrast. As you do this analysis, identify at least six more papers that are closely related to your three papers, e.g. earlier work they cite, other major approaches they reference. These additional 6 papers do not have to come from a major conference. Your should cite them with a brief explanation of how they are relevant. Also cite any other papers that you may have used as references.

Your paper should be about eight pages (single-spaced, 10 point font), not including the bibliography and including at most one page worth of figures. (Extra figures are fine, but only one page worth of figures counts towards the required eight pages.) It should be mostly text, with a small number of equations. Do not include a lot of equations, because that means you're diving too much into the details without trying to capture the main ideas.

The paper should be neatly formatted as a pdf document and submitted on gradescope.

Your second paper should choose a topic significantly different from your first paper.

Major conferences

Here is a list of major conferences to get your three main papers from. Please consult if you think I've forgotten a major conference or you would like to use a paper that's very important despite a less prestigious venue.

Core AI

AAAI, IJCAI, AKBC

Computer vision

CVPR, ICCV, ECCV

Natural language

ACL, NAACL, EACL, EMNLP, COLING (All available at the ACL anthology web site.)

Speech

ICASSP, Interspeech

Machine Learning

NIPS/NeurIPS, ICML, ICLR

Robotics

ICRA, IROS

Information Retrieval

SIGIR, KDD, ECIR

Essay Sample Content Preview:

AAAI Major Conference Paper
Student’s Name
Institutional Affiliation
AAAI Major Conference Paper
Banerjee, A., Bhattacharya, U., & Bera, A. (2022). Learning Unseen Emotions from Gestures via Semantically-Conditioned Zero-Shot Perception with Adversarial Autoencoders. Proceedings of the AAAI Conference on Artificial Intelligence, 36(1), 3–10. https://doi.org/10.1609/aaai.v36i1.19873
Research in emotion learning is crucial to many fields, such as affective computing, robotics, and human-computer interaction. According to existing research on emotion identification, an individual’s emotional state can be determined by observing their facial expressions, voice, gestures, and gait. According to psychological research, people may tell how someone feels by looking at affective cues like how fast their arms swing or how often they move. In more recent research, Bhattacharya et al. (2020a) mapped pose sequences to labeled emotions by combining such affective variables with posture dynamics retrieved using spatial-temporal graph convolutional networks. The selected reading is relevant to this discussion because it outlines emotion representation, identification from non-verbal body language, and pertinent advancements in zero-shot learning.
The demand for sizable, well-labeled datasets to construct grouping processes using earlier labeled emotions poses a significant hurdle for these machine learning-based emotion identification systems. However, it is time-consuming and frequently prohibitively expensive to generate large-scale datasets with suitable instances for each emotion, given the enormous range of human emotions and many emotion representations. Since labels for various classes are sometimes missing, zero-shot learning has received much attention as a solution. Research shows that it offers a different approach that does not rely on preexisting labels. Instead, generating the proper labels depends on the relationships between other visible and invisible classes.
In the generalized zero-shot learning (GZSL) model, a network is trained using data annotations that are only available for visible classes and then learns to distinguish all visible and invisible categories. By using data from other modalities, such as language semantics, to construct class entrenching that can correspond to each label, the model learns to generalize on the unseen classes. Recent solutions to the zero-shot problem have synthesized features for the unknown types using generative models, which are subsequently used for the classification challenge. The most popular techniques for syncing these properties are GANs and VAEs. However, research has demonstrated that VAEs’ modeling of multi-modal distributions might lead to less-than-ideal learned representations. Although GANs can produce features of more advanced quality than VAEs, their intellectual latent distribution spaces may be vulnerable to mode collapse.
The research offers a generalized zero-shot approach to identify emotions from upper-body poses created from 3D motion-captured gesture sequences. Zhan et al. (2019) previously demonstrated emotion perception from visuals in a zero-shot scenario. The researchers used the rich word entrenchments of the pre-trained word2vec model to capture the semantic links flanked by the emotion classes. A supervised emotion identification network produces a feature vector corresponding to a series of gesture inputs. The scholars combined an auto-encoder architecture with an adversarial loss to create latent illustrations for the gesture-based feature vectors learned from the fully supervised network corresponding. A second adversarial loss was then used to line up these latent illustrations with the semantically trained distribution space of the emotion classes.
Recent studies in emotion acknowledgment demonstrate the relationship between gaits and inborn psychological stress (Sanders et al. 2016). Deep learning techniques are used by Sapinski et al. (2019) to recognize emotional states from gestures collected from videos. The psychological analyses of facial expressions help identify people’s emotional states. Equally, the use of vision-based techniques to infer emotional states from facial expressions or auditory signals utilizing speech has become more common with the development of deep learning. Recent efforts have employed a variety of modalities, such as speech and facial expressions, to ascertain emotions.
Making the distinction between action recognition and emotion recognition from gestures is essential. Action recognition algorithms develop a latent space tuned for actions, whereas we develop a latent space optimized for emotions. The relative motions of nearby clusters of nodes are more critical for emotion recognition. According to Bhattacharya et al. (2020b), the movement of the hand indices, toes, and head are more prominently emphasized by the action recognition algorithms STGCN, DGNN, and MS-G3D. While these nodes are valuable for differentiating between behaviors like running and jumping, they lack sufficient information to do the same for emotions.
Both visible and unseen data are used for zero-shot learning (ZSL) training, but label prediction is only attempted and assessed on the unseen classes. In contrast, the prediction task is carried out for visible and invisible classes in generalized zero-shot learning (GSZL). The hubness problem happens when the model overflies to the training classes, thus demonstrating that GZSL is more complex than nominal ZSL. In GZSL, generative approaches have recently gained popularity. The above methods create features for unobserved classes using generative adversarial networks (GANs) or variational autoencoders (VAEs). Traditional GZSL generative models rely on a data augmentation technique to produce exciting characteristics that the model had not previously observed during training.
Bu, T., Ding, J., Yu, Z., & Huang, T. (2022). Optimized Potential Initialization for Low-Latency Spiking Neural Networks. Proceedings of the AAAI Conference on Artificial Intelligence, 36(1), 11–20. https://doi.org/10.1609/aaai.v36i1.19874
In recent years, Spiking Neural Networks (SNNs), the third generation of Artificial Neural Networks (ANNs), have received a lot of interest. SNNs only communicate data through spikes when the membrane potential reaches the threshold, in contrast to conventional ANNs that do so at every propagation cycle. SNNs exhibit higher energy efficiency than ANNs on neuromorphic chips because of their event-driven calculation, sparse activation, and multiplication-free features. SNNs also come with built-in robustness against adversaries. SNNs outperform ANNs with the same structure in terms of adversarial accuracy when subjected to gradient-based attacks. However, because of the difficulty in training high-performance SNNs, the use of SNNs is still restricted.
Gradient-based optimization and ANNto-SNN conversion are the two primary methods for training a multilayer SNN. Notably, using backpropagation to calculate the gradient, the gradient-based optimization builds on the concept of artificial neural networks. Although surrogate gradient methods have been used to address the non-differentiable issue of the threshold-triggered fring of SNNs, they are only applicable to shallow SNNs because the gradient becomes much more unstable as the layer depth increases. In addition, ANN training uses less GPU computing than the gradient-based optimization method. Instead of building a relationship between the dynamics of spiking neurons and the activation of analog neurons like the gradient-based optimization technique, ANN-to-SNN conversion first translates the parameters of a well-trained ANN to an SNN with little accuracy loss.
Indeed, it is possible to create high-performance SNNs without further training. The best results in deep network structure and large-scale datasets have been obtained through ANN-to-SNN conversion, which needs almost as much GPU to compute and time as ANN training. Despite these benefits, accuracy, and latency have been compromised. The practical implementation of SNN is hampered by the need for a lengthy simulation period to match the fring rate of a spiking neuron with the activation value of an analog neuron to reach high precision as original ANNs. In this study, the scholars take the first step toward converted SNNs with excellent performance and extremely low latency (less than 32-time steps). They demonstrate that the initialization of membrane potentials, customarily selected to be zero for all neurons, can be adjusted to reduce the trade-off between accuracy and latency, as opposed to introducing constraints that ease ANN-to-SNN conversion at the expense of model capacity.
Although zero initialization of membrane potentials can help connect the dynamics of spiking neurons to the activation of analog neurons more efficiently, it also comes with unavoidable long latency issues. As shown in their illustration in Figure 1, the scholars discovered that converted SNN neurons take a long time to fire their first spike without adequate initialization, making the network appear to be “inactive” for the first few time steps. Based on this, the scholars theoretically examined the conversion of ANN to SNN. Besides, they demonstrated that the expectation of square conversion error is at its lowest value when the initial membrane potential is half of the fring threshold. In the meantime, the expected conversion error is zero. In converted SNNs, they are set as an ideal initial value that results in a significant reduction in inference time and a noticeably improved accuracy in low inference time.
The gradient-based optimization techniques, which can be categorized into two groups based on how they compute the gradient, are activation-based and timing-based. The activation-based approaches, which take inspiration from the training of recurrent neural networks in ANNs, unfold the SNNs into discrete time steps and compute the gradient with backpropagation through time (BPTT). The surrogate gradient is frequently utilized since the gradient of activation concerning the membrane potential is non-differentiable. The surrogate gradient, however, has not undergone any severe theoretical examination. The gradient becomes substantially more unstable as SNN layer depth increases (>50 layers), and the networks experience degradation. Timing-based methods can significantly increase the runtime effectiveness of BP training because they use some approximation techniques to predict the gradient of timings of fringe spikes concerning the membrane potential at the spike timing. However, they often only use external networks.
Cao et al. (2015) first proposed the ANN-to-SNN conversion, which involves training an ANN with ReLU activations and then converting the ANN to an SNN by swapping the activations out for spiking neurons. Deep SNNs can achieve equivalent performance to deep ANNs by correctly mapping the parameters in ANN to SNN. Weight normalization and threshold balancing are two more techniques suggested to examine conversion loss and enhance the overall performance of converted SNNs. In earlier research, IF neurons were subjected to a soft reset mechanism to prevent information loss when neurons were reset. Although these studies can achieve loss-less conversion with lengthy inference time steps, they nevertheless experience significant accuracy loss with shorter time steps.
Most recent studies have concentrated on speeding up inference using converted SNN. To more effectively tie ANNs to SNN, Stockl and Maass (2021) suggest novel spiking neurons. Han and Roy (2020) employ a time-based encoding technique to expedite inference. The trade-off between accuracy and latency is addressed by RMP (Han, Srinivasan, and Roy 2020), RNL (Ding et al. 2021), and TCL (Ho and Chang 2020) by dynamically altering the threshold. By maximizing the upper bound of the ft curve, Ding et al. (2021) show that the inference time may be shortened and quantify the ft between the activations of ANNs and the fring rates of SNNs. Hwang et al. (2021) devised a layer-wise searching technique and conducted sufficient tests to investigate the ideal initial membrane potential value. To make converted SNNs relatively low-latency, Deng et al. (2020) and Li et al. (2021) suggest a new technique to adjust weight, bias, and membrane potential in each layer. In contrast to the abovementioned approaches, we explicitly tune the initial membrane potential to improve performance at a fast inference rate.
Cameron, C., Hartford, J., Lundy, T., & Leyton-Brown, K. (2022). The Perils of Learning Before Optimizing. Proceedings of the AAAI Conference on Artificial Intelligence, 36(4), 3708–3715. https://doi.org/10.1609/aaai.v36i4.20284
Optimization issues whose inputs must be inferred from data rather than naturally obtained are becoming more prevalent. For instance, imagine a facility location application whose loss rests on forecasts of demand and road congestion. According to Yan and Gregory (2012), Centola (2018), and Bahulkar et al. (2018), splitting prediction from optimization into two distinct stages is the most logical solution to such issues. A model is trained to maximize a joint loss function in the first step (e.g., forecast travel times to lessen mean squared error). In the second stage, an optimization problem, such as recommending facility sites given demand and travel time projections, is parameterized using the model predictions and solved.
Lately, there has been a lot of interest in the “end-to-end” method of training a predictive model to reduce the loss on the downstream optimization job. The development of deep learning has made it possible to link together any collection of differentiable functions, enabling the end-to-end training of a differentiable prediction task coupled with a differentiable downstream job utilizing backpropagation. By demonstrating how to use a QP solver to calculate gradients analytically, Amos and Kolter (2017) presented the idea of optimization as a layer. Notably, this led to several subsequent papers that demonstrated how to construct differentiable layers for various classes of optimization problems, including submodular optimization (Djolonga and Krause 2017), linear programming (Wilder, Dilkina, and Tambe 2019), general cone programs (Agrawal et al. 2019), and disciplined convex programs (Agrawal et al. 2020).
A wide range of applications, including inventory stocking (Amos and Kolter 2017), bipartite matching (Wilder, Dilkina, and Tambe 2019), and facility locating (Wilder et al. 2020), have proven that these “end-to-end” approaches enhance performance. The gains are sometimes attributed to the end-to-end method’s superior error trade-offs over the two-stage process. For instance, Donti, Amos, and Kolter (2017) argue that since all models inevitably make mistakes, it is crucial to consider a final task-based objective. Conversely, Wilder, Dilkina, and Tambe (2019) propose that end-to-end optimization will be beneficial for challenging problems where the best model is imperfect, such as when either model capacity or data are limited. Elmachtoub and Grigas (2020) support the effectiveness of end-to-end. The scholars state that because they have access to the Bayes optimum predictor for a specific loss function, one might argue that end-to-end learning is useless in an “error-free” environment.
Contrarily, as this study demonstrates, the two-stage strategy can fail in the typical scenario where the prediction stage models’ expectations exceed prediction...

Updated on January 26, 2024

Get the Whole Paper!

Not exactly what you need?

Do you need a custom essay? Order right now:

Order

👀 Other Visitors are Viewing These APA Essay Samples:

Striking Passage from The Nutmeg's Curse: Parables for a Planet in Crisis

2 pages/≈550 words | 1 Source | APA | Literature & Language | Essay |
The Nutmeg's Curse

2 pages/≈550 words | 1 Source | APA | Literature & Language | Essay |
Genre Analysis on Annual Reports

6 pages/≈1650 words | 4 Sources | APA | Literature & Language | Essay |