- Spot-then-Recognize (STR) Challenge | Codabench site
- Visual Question Answering (VQA) | Codabench site
Important Dates
- Test Set Release: 19th May 2025
- Challenge Platform Submission Open: 23rd May 2025
- Challenge Submission Deadline (for Codabench only): 27th June 2025 (Tentative)
- Paper Invitation (after confirmation of results): 3rd July 2025
- Paper Submission deadline: 30th July 2025
- Notification of Accepted Papers: 7th August 2025
- Camera-Ready Deadline: 26th August 2025
Unseen dataset for both tasks
This year, we will be using the unseen cross-cultural test sets to evaluate algorithms' performances in a fairer manner.
Unseen Dataset for STR
- The unseen testing set (
MEGC2025-testSet
) (same version as MEGC2023 Unseen dataset) contains 30 long video, including 10 long videos from SAMM (SAMM Challenge dataset) and 20 clips cropped from different videos in CAS(ME)3 (unreleased before). The frame rate for SAMM Challenge dataset is 200fps and the frame rate for CAS(ME)3 is 30 fps. The participants should test on this unseen dataset. - To obtain the
MEGC2025-testSet
, download and fill in the license agreement form of SAMM Challenge dataset and the license agreement form of CAS(ME)3_clip, upload the file through this link: https://www.wjx.top/vm/wxCeVHP.aspx# .- For the request from a bank or company, the participants are required to ask their director or CEO to sign the form.
- Reference:
- Li, J., Dong, Z., Lu, S., Wang, S.J., Yan, W.J., Ma, Y., Liu, Y., Huang, C. and Fu, X. (2023). CAS(ME)3: A Third Generation Facial Spontaneous Micro-Expression Database with Depth Information and High Ecological Validity. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 2782-2800, 1 March 2023, doi: 10.1109/TPAMI.2022.3174895.
- Davison, A. K., Lansley, C., Costen, N., Tan, K., & Yap, M. H. (2016). SAMM: A spontaneous micro-facial movement dataset. IEEE Transactions on Affective Computing, 9(1), 116-129.
Unseen Dataset for VQA
- The unseen testing set for VQA contains 24 ME clips, including 7 clips from SAMM (SAMM Challenge dataset) and 17 clips from different videos in CAS(ME)3 (unreleased before). The frame rate for SAMM Challenge dataset is 200fps and the frame rate for CAS(ME)3 is 30 fps. The participants should test on this unseen dataset.
- To obtain the
MEGC2025-testSet-ME-VQA
, download and fill in the license agreement form of SAMM Challenge dataset and the license agreement form of CAS(ME)3_clip, upload the file through this link: https://www.wjx.top/vm/wxCeVHP.aspx# .- For the request from a bank or company, the participants are required to ask their director or CEO to sign the form.
- Reference:
- Li, J., Dong, Z., Lu, S., Wang, S.J., Yan, W.J., Ma, Y., Liu, Y., Huang, C. and Fu, X. (2023). CAS(ME)3: A Third Generation Facial Spontaneous Micro-Expression Database with Depth Information and High Ecological Validity. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 2782-2800, 1 March 2023, doi: 10.1109/TPAMI.2022.3174895.
- Davison, A. K., Lansley, C., Costen, N., Tan, K., & Yap, M. H. (2016). SAMM: A spontaneous micro-facial movement dataset. IEEE Transactions on Affective Computing, 9(1), 116-129.
Spot-then-Recognize (STR) Task
Since the rapid advancement of ME research started about a decade ago, most works have been mainly focused on two separate tasks: spotting and recognition. The task of only recognizing the ME class can be unrealistic in real-world settings since it assumes that the ME sequence has already been identified - an ill-posed problem in the case of a continuous-running video. On the other hand, the spotting task is unrealistic in its applicability since it cannot interpret the actual emotional state of the person observed.
A more realistic setting, also known as "spot-then-recognize", performs spotting followed by recognition in a sequential manner. Only samples that have been correctly spotted in the spotting step (i.e. true positives) will be passed on to the recognition step to be classified for its emotion class. The task will use the unseen dataset, and evaluated using selected metrics.
Reference:- Liong, G-B., See, J. and C.S. Chan (2023). Spot-then-recognize: A micro-expression analysis network for seamless evaluation of long videos. Signal Processing: Image Communication, vol. 110, pp. 116875, January 2023, doi: 10.1016/j.image.2022.116875
Evaluation Protocol
- Submissions will use the Codabench Competition Leaderboard.
- Participants should upload the predicted results for both the unseen CAS(ME)3 and SAMM datasets to the Codabench Leaderboard where specific evaluation metrics will be calculated.
- Evaluation metrics (for SAMM, CAS):
- F1-score, for Spotting and Analysis steps. (Higher the better)
- Spot-then-Recognize Score (STRS), which is the product of the Spotting and Analysis F1-scores. (Higher the better)
- Submissions to the Leaderboard must be made in the form of a zip file containining the predicted csv files with the following filenames:
cas_pred.csv
(for the CAS(ME)3 samples)samm_pred.csv
(for the SAMM samples)
- An example submission is provided here: example_submission_STR.
- The evaluation script is available at https://github.com/genbing99/STRS-Metric.
- The baseline method can be found in the following paper (please cite):
Liong, G-B., See, J. and C.S. Chan (2023). Spot-then-recognize: A micro-expression analysis network for seamless evaluation of long videos. Signal Processing: Image Communication, Vol. 110, pp. 116875.
Recommended Training Databases
- SAMM Long Videos with 147 long videos at 200 fps (average duration: 35.5s).
- To download the dataset, please visit: http://www2.docm.mmu.ac.uk/STAFF/M.Yap/dataset.php. Download and fill in the license agreement form, email to M.Yap@mmu.ac.uk with email subject: SAMM long videos.
- Reference: Yap, C. H., Kendrick, C., & Yap, M. H. (2020, November). SAMM long videos: A spontaneous facial micro-and macro-expressions dataset. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) (pp. 771-776). IEEE.
- CAS(ME)2 with 97 long videos at 30 fps (average duration: 148s).
- To download the dataset, please visit: http://casme.psych.ac.cn/casme/e3. Download and fill in the license agreement form, submit throuth the website. >.
- Reference: Qu, F., Wang, S. J., Yan, W. J., Li, H., Wu, S., & Fu, X. (2017). CAS (ME) $^ 2$: a database for spontaneous macro-expression and micro-expression spotting and recognition. IEEE Transactions on Affective Computing, 9(4), 424-436.
- SMIC-E-long with 162 long videos at 100 fps (average duration: 22s).
- To download the dataset, please visit: https://www.oulu.fi/cmvs/node/41319. Download and fill in the license agreement form (please indicate which version/subset you need), email to Xiaobai.Li@oulu.fi.
- Reference: Tran, T. K., Vo, Q. N., Hong, X., Li, X., & Zhao, G. (2021). Micro-expression spotting: A new benchmark. Neurocomputing, 443, 356-368.
- CAS(ME)3 with 1300 long videos at 30 fps (average duration: 98s).
- To download the dataset, please visit: http://casme.psych.ac.cn/casme/e4. Download and fill in the license agreement form, submit throuth the website.
- Reference: Li, J., Dong, Z., Lu, S., Wang, S. J., Yan, W. J., Ma, Y., ... & Fu, X. (2022). CAS (ME)3: A third generation facial spontaneous micro-expression database with depth information and high ecological validity. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 2782-2800, doi: 10.1109/TPAMI.2022.3174895..
- 4DME with 270 long videos at 60 fps (average duration: 2.5s).
- To download the dataset, please visit: https://www.oulu.fi/en/university/faculties-and-units/faculty-information-technology-and-electrical-engineering/center-machine-vision-and-signal-analysis. Download and fill in the license agreement form , email to Xiaobai.Li@oulu.fi.
- Reference: Li, X., Cheng, S., Li, Y., Behzad, M., Shen, J., Zafeiriou, S., ... & Zhao, G. (2022). 4DME: A spontaneous 4D micro-expression dataset with multimodalities. IEEE Transactions on Affective Computing.
Visual Question Answering (VQA) Task
For the first time in MEGC 2025, this is a new task that introduces a visual question answering challenge for ME analysis, to leverage on advanced vision-language models (VLMs) and multimodal large language models (LLMs). Instead of relying on structured labels, ME annotations, such as emotion classes, action units, are converted into question-answer (QA) pairs. Given an image or video sequence as input with natural language prompts, models must generate answers that describe the ME, its attributes, etc. These questions can cover a wide range of attributes, from binary classification such as "Is the action unit lip corner depressor shown on the face?" to multiclass classification like "What is the expression class?", and to more complex inquiries like "What are the action units present, and based on them, what is the expression class?"
Participants may train the model based on the curated ME VQA datasets or explore zero-shot reasoning, in-context learning, multi-agent systems, etc. Evaluation will assess the UF1 and UAR of the emotion classes, and NLP metrics BLEU and ROUGE of overall responses. In addition to automatic metrics, human evaluation may be used to gauge the reasoning quality of the models. This task provides a new multimodal perspective on ME analysis, encouraging interpretable and context-aware ME analysis through natural language interaction. The curated ME VQA dataset is improved from the MEGC2019 composite dataset with clips from CASME II, SAMM, and SMIC by adding QA pairs. The participant can use it as a starting point, while they can also include other training samples and generate their own QA pairs.
We produce the baseline results in our challenge paper: X, Fan, J. Li, J. See, M. H. Yap, W.-H. Cheng, X. Li, X. Hong, S.-J. Wang and A. K. Davison. MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering. arXiv preprint arXiv:2506.15298, 2025.
Evaluation Protocol
- Submissions will use the Codabench Competition Leaderboard.
- Participants should upload the predicted results for both unseen CAS(ME)3 and SAMM datasets to the Codabench Leaderboard where specific evaluation metrics will be calculated.
- Evaluation metrics (for SAMM, CAS):
- UF1 and UAR for both coarse and fine-grained emotion class. (Higher the better)
- BLEU and ROUGH-1 for all the answers. (Higher the better)
- Participants can fill in the test VQA to-answer josonl files and renamed them as xxx_pred.jsonl. These files are available at Google Drive and Baidu Drive.
me_vqa_casme3_test_to_answer.jsonl
me_vqa_samm_test_to_answer.jsonl
- Submissions to the Leaderboard must be made in the form of a zip file containining the predicted
jsonl files with the following filenames:
me_vqa_casme3_test_pred.jsonl
(for the unseen CAS(ME)3 ME clips)me_vqa_samm_test_pred.jsonl
(for the unseen SAMM ME clips)
Recommended Training Databases
- Curated ME VQA dataset
- Please download the annotation from here Curated ME VQA dataset.
- The Curated ME VQA dataset is improved from the MEGC2019 composite dataset with clips from SAMM, CASME II, and SMIC by adding QA pairs. Therefore, to access the ME clips, please follow the dataset request link below.
- SAMM with 159 ME clips at 100 fps.
- To download the dataset, please visit: http://www2.docm.mmu.ac.uk/STAFF/M.Yap/dataset.php. Download and fill in the license agreement form, email to M.Yap@mmu.ac.uk with email subject: SAMM videos.
- Reference: Dvison, A. K., Lansley, C., Costen, N., Tan, K., & Yap, M. H. (2016). SAMM: A spontaneous micro facial movement dataset. IEEE Transactions on Affective Computing, 9(1), 116-129.
- CASME II with 247 ME clips at 200 fps.
- To download the dataset, please visit: http://casme.psych.ac.cn/casme/e3. Download and fill in the license agreement form, submit throuth the website. >.
- Reference: Yan, W. J., Li, X., Wang, S. J., Zhao, G., Liu, Y. J., Chen, Y. H., & Fu, X. (2014). CASME II: An improved spontaneous micro-expression database and the baseline evaluation. PloS one, 9(1), e86041.
- SMIC-E-long with 162 ME clips at 100 fps (average duration: 22s).
- To download the dataset, please visit: https://www.oulu.fi/cmvs/node/41319. Download and fill in the license agreement form (please indicate which version/subset you need), email to Xiaobai.Li@oulu.fi.
- Reference: Tran, T. K., Vo, Q. N., Hong, X., Li, X., & Zhao, G. (2021). Micro-expression spotting: A new benchmark. Neurocomputing, 443, 356-368.
- CAS(ME)3 with 1109 ME clips at 30 fps (average duration: 98s).
- To download the dataset, please visit: http://casme.psych.ac.cn/casme/e4. Download and fill in the license agreement form, submit throuth the website.
- Reference: Li, J., Dong, Z., Lu, S., Wang, S. J., Yan, W. J., Ma, Y., ... & Fu, X. (2022). CAS (ME)3: A third generation facial spontaneous micro-expression database with depth information and high ecological validity. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 2782-2800, doi: 10.1109/TPAMI.2022.3174895.
- 4DME with 1068 ME clips at 60 fps (average duration: 2.5s).
- To download the dataset, please visit: https://www.oulu.fi/en/university/faculties-and-units/faculty-information-technology-and-electrical-engineering/center-machine-vision-and-signal-analysis. Download and fill in the license agreement form , email to Xiaobai.Li@oulu.fi.
- Reference: Li, X., Cheng, S., Li, Y., Behzad, M., Shen, J., Zafeiriou, S., ... & Zhao, G. (2022). 4DME: A spontaneous 4D micro-expression dataset with multimodalities. IEEE Transactions on Affective Computing.
Submission
Please note: All relevant submission deadlines are at 23:59 AoE and paper submissions will follow guidelines as per ACM.
- Challenge submission platform for STR task: Codabench site
Frequently Asked Questions
- Q: How to deal with the spotted intervals with overlap?
A: We consider that each ground-truth interval corresponds to at most one single spotted interval. If your algorithm detects multiple with overlap, you should merge them into an optimal interval. The fusion method is also part of your algorithm, and the final result evaluation only cares about the optimal interval obtained. - Q: For the STR challenge, how many classes are used in the classification part?
A: You are required to only classify emotions into three classes:"negative"
,"positive"
,"surprise"
. Only correctly spotted micro-expressions are passed on to the classification part, also knowns as Analysis (on the Leaderboard). The"other"
class is not included in the evaluation calculation for the Analysis part. However, all occurrences, including those labelled with the"other"
class are considered in the Spotting part as they are micro-expressions.