ACMMM MEGC2025

This year's Grand Challenge comprises of two tracks:

Spot-then-Recognize (STR) Challenge | Codabench site
Visual Question Answering (VQA) | Codabench site

Our challenge overview paper is here: X, Fan, J. Li, J. See, M. H. Yap, W.-H. Cheng, X. Li, X. Hong, S.-J. Wang and A. K. Davison. MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering. arXiv preprint arXiv:2506.15298, 2025.

News

27/06/2025: Updated STR example submission file
27/06/2025: Updated paper submission information
27/06/2025: Updated submission deadline for STR track and paper submission

Important Dates

Test Set Release: 19th May 2025
Challenge Platform Submission Open: 23rd May 2025
STR Track Challenge Submission Deadline (for Codabench only): ~~27th June 2025~~ 4th July 2025 at 23:59 AoE (FIRM)
VQA Track Challenge Submission Deadline (for Codabench only): 27th June 2025 at 11:59 AoE (FIRM)
Paper Invitation (after confirmation of results): ~~3rd July 2025~~ 7th July 2025
Paper Submission deadline: ~~30th July 2025~~ 31st July 2025
Notification of Accepted Papers: ~~7th August 2025~~ 8th August 2025
Camera-Ready Deadline: 26th August 2025

Unseen dataset for both tasks

This year, we will be using the unseen cross-cultural test sets to evaluate algorithms' performances in a fairer manner.

Unseen Dataset for STR

The unseen testing set (MEGC2025-testSet) (same version as MEGC2023 Unseen dataset) contains 30 long video, including 10 long videos from SAMM (SAMM Challenge dataset) and 20 clips cropped from different videos in CAS(ME)³ (unreleased before). The frame rate for SAMM Challenge dataset is 200fps and the frame rate for CAS(ME)³ is 30 fps. The participants should test on this unseen dataset.
To obtain the MEGC2025-testSet, download and fill in the license agreement form of SAMM Challenge dataset and the license agreement form of CAS(ME)³_clip, upload the file through this link: https://www.wjx.top/vm/wxCeVHP.aspx# .
- For the request from a bank or company, the participants are required to ask their director or CEO to sign the form.
- Reference:
  1. Li, J., Dong, Z., Lu, S., Wang, S.J., Yan, W.J., Ma, Y., Liu, Y., Huang, C. and Fu, X. (2023). CAS(ME)³: A Third Generation Facial Spontaneous Micro-Expression Database with Depth Information and High Ecological Validity. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 2782-2800, 1 March 2023, doi: 10.1109/TPAMI.2022.3174895.
  2. Davison, A. K., Lansley, C., Costen, N., Tan, K., & Yap, M. H. (2016). SAMM: A spontaneous micro-facial movement dataset. IEEE Transactions on Affective Computing, 9(1), 116-129.

Unseen Dataset for VQA

The unseen testing set for VQA contains 24 ME clips, including 7 clips from SAMM (SAMM Challenge dataset) and 17 clips from different videos in CAS(ME)³ (unreleased before). The frame rate for SAMM Challenge dataset is 200fps and the frame rate for CAS(ME)³ is 30 fps. The participants should test on this unseen dataset.
To obtain the MEGC2025-testSet-ME-VQA, download and fill in the license agreement form of SAMM Challenge dataset and the license agreement form of CAS(ME)³_clip, upload the file through this link: https://www.wjx.top/vm/wxCeVHP.aspx# .
- For the request from a bank or company, the participants are required to ask their director or CEO to sign the form.
- Reference:
  1. Li, J., Dong, Z., Lu, S., Wang, S.J., Yan, W.J., Ma, Y., Liu, Y., Huang, C. and Fu, X. (2023). CAS(ME)³: A Third Generation Facial Spontaneous Micro-Expression Database with Depth Information and High Ecological Validity. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 2782-2800, 1 March 2023, doi: 10.1109/TPAMI.2022.3174895.
  2. Davison, A. K., Lansley, C., Costen, N., Tan, K., & Yap, M. H. (2016). SAMM: A spontaneous micro-facial movement dataset. IEEE Transactions on Affective Computing, 9(1), 116-129.

Spot-then-Recognize (STR) Task

Since the rapid advancement of ME research started about a decade ago, most works have been mainly focused on two separate tasks: spotting and recognition. The task of only recognizing the ME class can be unrealistic in real-world settings since it assumes that the ME sequence has already been identified - an ill-posed problem in the case of a continuous-running video. On the other hand, the spotting task is unrealistic in its applicability since it cannot interpret the actual emotional state of the person observed.

A more realistic setting, also known as "spot-then-recognize", performs spotting followed by recognition in a sequential manner. Only samples that have been correctly spotted in the spotting step (i.e. true positives) will be passed on to the recognition step to be classified for its emotion class. The task will use the unseen dataset, and evaluated using selected metrics.

Reference:

Liong, G-B., See, J. and C.S. Chan (2023). Spot-then-recognize: A micro-expression analysis network for seamless evaluation of long videos. Signal Processing: Image Communication, vol. 110, pp. 116875, January 2023, doi: 10.1016/j.image.2022.116875

Evaluation Protocol

Submissions will use the Codabench Competition Leaderboard.
Participants should upload the predicted results for both the unseen CAS(ME)³ and SAMM datasets to the Codabench Leaderboard where specific evaluation metrics will be calculated.
Evaluation metrics (for SAMM, CAS):
- F1-score, for Spotting and Analysis steps. (Higher the better)
- Spot-then-Recognize Score (STRS), which is the product of the Spotting and Analysis F1-scores. (Higher the better)
Submissions to the Leaderboard must be made in the form of a zip file containining the predicted csv files with the following filenames:
- cas_pred.csv (for the CAS(ME)³ samples)
- samm_pred.csv (for the SAMM samples)
An example submission is provided here: example_submission_STR.
The evaluation script is available at https://github.com/genbing99/STRS-Metric.
The baseline method can be found in the following paper (please cite):
Liong, G-B., See, J. and C.S. Chan (2023). Spot-then-recognize: A micro-expression analysis network for seamless evaluation of long videos. Signal Processing: Image Communication, Vol. 110, pp. 116875.

Recommended Training Databases

SAMM Long Videos with 147 long videos at 200 fps (average duration: 35.5s).
- To download the dataset, please visit: http://www2.docm.mmu.ac.uk/STAFF/M.Yap/dataset.php. Download and fill in the license agreement form, email to M.Yap@mmu.ac.uk with email subject: SAMM long videos.
- Reference: Yap, C. H., Kendrick, C., & Yap, M. H. (2020, November). SAMM long videos: A spontaneous facial micro-and macro-expressions dataset. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) (pp. 771-776). IEEE.
CAS(ME)² with 97 long videos at 30 fps (average duration: 148s).
- To download the dataset, please visit: http://casme.psych.ac.cn/casme/e3. Download and fill in the license agreement form, submit throuth the website. >.
- Reference: Qu, F., Wang, S. J., Yan, W. J., Li, H., Wu, S., & Fu, X. (2017). CAS (ME) $^ 2$: a database for spontaneous macro-expression and micro-expression spotting and recognition. IEEE Transactions on Affective Computing, 9(4), 424-436.
SMIC-E-long with 162 long videos at 100 fps (average duration: 22s).
- To download the dataset, please visit: https://www.oulu.fi/cmvs/node/41319. Download and fill in the license agreement form (please indicate which version/subset you need), email to Xiaobai.Li@oulu.fi.
- Reference: Tran, T. K., Vo, Q. N., Hong, X., Li, X., & Zhao, G. (2021). Micro-expression spotting: A new benchmark. Neurocomputing, 443, 356-368.
CAS(ME)³ with 1300 long videos at 30 fps (average duration: 98s).
- To download the dataset, please visit: http://casme.psych.ac.cn/casme/e4. Download and fill in the license agreement form, submit throuth the website.
- Reference: Li, J., Dong, Z., Lu, S., Wang, S. J., Yan, W. J., Ma, Y., ... & Fu, X. (2022). CAS (ME)³: A third generation facial spontaneous micro-expression database with depth information and high ecological validity. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 2782-2800, doi: 10.1109/TPAMI.2022.3174895..
4DME with 270 long videos at 60 fps (average duration: 2.5s).
- To download the dataset, please visit: https://www.oulu.fi/en/university/faculties-and-units/faculty-information-technology-and-electrical-engineering/center-machine-vision-and-signal-analysis. Download and fill in the license agreement form , email to Xiaobai.Li@oulu.fi.
- Reference: Li, X., Cheng, S., Li, Y., Behzad, M., Shen, J., Zafeiriou, S., ... & Zhao, G. (2022). 4DME: A spontaneous 4D micro-expression dataset with multimodalities. IEEE Transactions on Affective Computing.

Visual Question Answering (VQA) Task

For the first time in MEGC 2025, this is a new task that introduces a visual question answering challenge for ME analysis, to leverage on advanced vision-language models (VLMs) and multimodal large language models (LLMs). Instead of relying on structured labels, ME annotations, such as emotion classes, action units, are converted into question-answer (QA) pairs. Given an image or video sequence as input with natural language prompts, models must generate answers that describe the ME, its attributes, etc. These questions can cover a wide range of attributes, from binary classification such as "Is the action unit lip corner depressor shown on the face?" to multiclass classification like "What is the expression class?", and to more complex inquiries like "What are the action units present, and based on them, what is the expression class?"

Participants may train the model based on the curated ME VQA datasets or explore zero-shot reasoning, in-context learning, multi-agent systems, etc. Evaluation will assess the UF1 and UAR of the emotion classes, and NLP metrics BLEU and ROUGE of overall responses. In addition to automatic metrics, human evaluation may be used to gauge the reasoning quality of the models. This task provides a new multimodal perspective on ME analysis, encouraging interpretable and context-aware ME analysis through natural language interaction. The curated ME VQA dataset is improved from the MEGC2019 composite dataset with clips from CASME II, SAMM, and SMIC by adding QA pairs. The participant can use it as a starting point, while they can also include other training samples and generate their own QA pairs.

We produce the baseline results in our challenge paper: X, Fan, J. Li, J. See, M. H. Yap, W.-H. Cheng, X. Li, X. Hong, S.-J. Wang and A. K. Davison. MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering. arXiv preprint arXiv:2506.15298, 2025.

Evaluation Protocol

Submissions will use the Codabench Competition Leaderboard.
Participants should upload the predicted results for both unseen CAS(ME)³ and SAMM datasets to the Codabench Leaderboard where specific evaluation metrics will be calculated.
Evaluation metrics (for SAMM, CAS):
- UF1 and UAR for both coarse and fine-grained emotion class. (Higher the better)
- BLEU and ROUGH-1 for all the answers. (Higher the better)
Participants can fill in the test VQA to-answer josonl files and renamed them as xxx_pred.jsonl. These files are available at Google Drive and Baidu Drive.
- me_vqa_casme3_test_to_answer.jsonl
- me_vqa_samm_test_to_answer.jsonl
Submissions to the Leaderboard must be made in the form of a zip file containining the predicted jsonl files with the following filenames:
- me_vqa_casme3_test_pred.jsonl (for the unseen CAS(ME)³ ME clips)
- me_vqa_samm_test_pred.jsonl (for the unseen SAMM ME clips)

Recommended Training Databases

Curated ME VQA dataset
- Please download the annotation from here Curated ME VQA dataset.
- The Curated ME VQA dataset is improved from the MEGC2019 composite dataset with clips from SAMM, CASME II, and SMIC by adding QA pairs. Therefore, to access the ME clips, please follow the dataset request link below.
SAMM with 159 ME clips at 100 fps.
- To download the dataset, please visit: http://www2.docm.mmu.ac.uk/STAFF/M.Yap/dataset.php. Download and fill in the license agreement form, email to M.Yap@mmu.ac.uk with email subject: SAMM videos.
- Reference: Dvison, A. K., Lansley, C., Costen, N., Tan, K., & Yap, M. H. (2016). SAMM: A spontaneous micro facial movement dataset. IEEE Transactions on Affective Computing, 9(1), 116-129.
CASME II with 247 ME clips at 200 fps.
- To download the dataset, please visit: http://casme.psych.ac.cn/casme/e3. Download and fill in the license agreement form, submit throuth the website. >.
- Reference: Yan, W. J., Li, X., Wang, S. J., Zhao, G., Liu, Y. J., Chen, Y. H., & Fu, X. (2014). CASME II: An improved spontaneous micro-expression database and the baseline evaluation. PloS one, 9(1), e86041.
SMIC-E-long with 162 ME clips at 100 fps (average duration: 22s).
- To download the dataset, please visit: https://www.oulu.fi/cmvs/node/41319. Download and fill in the license agreement form (please indicate which version/subset you need), email to Xiaobai.Li@oulu.fi.
- Reference: Tran, T. K., Vo, Q. N., Hong, X., Li, X., & Zhao, G. (2021). Micro-expression spotting: A new benchmark. Neurocomputing, 443, 356-368.
CAS(ME)³ with 1109 ME clips at 30 fps (average duration: 98s).
- To download the dataset, please visit: http://casme.psych.ac.cn/casme/e4. Download and fill in the license agreement form, submit throuth the website.
- Reference: Li, J., Dong, Z., Lu, S., Wang, S. J., Yan, W. J., Ma, Y., ... & Fu, X. (2022). CAS (ME)³: A third generation facial spontaneous micro-expression database with depth information and high ecological validity. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 2782-2800, doi: 10.1109/TPAMI.2022.3174895.
4DME with 1068 ME clips at 60 fps (average duration: 2.5s).
- To download the dataset, please visit: https://www.oulu.fi/en/university/faculties-and-units/faculty-information-technology-and-electrical-engineering/center-machine-vision-and-signal-analysis. Download and fill in the license agreement form , email to Xiaobai.Li@oulu.fi.
- Reference: Li, X., Cheng, S., Li, Y., Behzad, M., Shen, J., Zafeiriou, S., ... & Zhao, G. (2022). 4DME: A spontaneous 4D micro-expression dataset with multimodalities. IEEE Transactions on Affective Computing.

Submission

The top 3 papers after the review process will be submitted for publication in the ACM MM'25 proceedings.

Please note: All relevant submission deadlines are at 23:59 AoE and paper submissions will follow guidelines as per ACM.

Challenge submission platform for STR task: Codabench site

Challenge submission platform for VQA task: Codabench site

Submission platform link: TBC

Submitted papers (.pdf format) must use the ACM Article Template https://www.acm.org/publications/proceedings-template as used by regular ACM MM submissions. Please use the template in traditional double-column format to prepare your submissions. For example, word users may use Word Interim Template, and latex users may use sample-sigconf template.
Grand challenge papers will go through a single-blind review process. Each grand challenge paper submission is limited to 6 pages with 2 extra pages for references only.
For all other required files besides the paper, please submit in a single zip file and upload to the submission system as supplementary material. It is compulsory to include:
- GitHub repository URL containing codes of your implemented method, and all other relevant files such as feature/parameter data.
- CSV files reporting the results, i.e. cas_pred.csv, samm_pred.csv (for both STR and VQA tracks)
The organizers have the right to reject any submissions that: 1) are not accompanied by a paper, 2) did not share the code repository and reported results for verification purposes.

Frequently Asked Questions

Q: How to deal with the spotted intervals with overlap?
A: We consider that each ground-truth interval corresponds to at most one single spotted interval. If your algorithm detects multiple with overlap, you should merge them into an optimal interval. The fusion method is also part of your algorithm, and the final result evaluation only cares about the optimal interval obtained.
Q: For the STR challenge, how many classes are used in the classification part?
A: You are required to only classify emotions into three classes: "negative", "positive", "surprise". Only correctly spotted micro-expressions are passed on to the classification part, also knowns as Analysis (on the Leaderboard). The "other" class is not included in the evaluation calculation for the Analysis part. However, all occurrences, including those labelled with the "other" class are considered in the Spotting part as they are micro-expressions.