More than mere information, data can be timeless assets: their historical relevance over time makes them an ongoing reference and an essential starting point for future research. Moreover, student data enable precise guidance for higher education policies and support various analyses that improve the educational process. The Data Hub of the Institute for the Future of Education of Tecnológico de Monterrey operates under this principle.
The IFE Data Hub provides access to institutional data collections for Tecnológico de Monterrey’s educational community, as well as to national and international researchers working on Educational Innovation projects. Likewise, it provides open data and shares findings derived from its calls and institutional academic contributions. The publication of these data and findings is key to their validation and subsequent reuse by other experts in the respective fields, ensuring scientific rigor when optimizing training programs or lines of action.
Its work is driven by the quality of education and the efficiency of university processes that support students, who are the central axis. The team contributes to the institutional strategies proposed to mitigate various educational challenges, such as dropout rates, and those that promote student retention, a sense of belonging, and social commitment.
This effort is led by Joanna Alvarado-Uribe, leader of the IFE Data Hub; Paola G. Mejía Almada, Data Operations Coordinator; Ubaldo Martínez, coordinator of data quality assurance; and Alma Rosa Mena Martínez, project specialist. Their actions always consider the environment and context of each student, including factors such as the institution and the trainees’ academic experience, which vary across the research.
Having this information curated and documented by quality metrics is valuable for subsequent studies, providing context for the collected data and helping to avoid bias or ambiguity.
Effective Data Curation Practices
Joanna Alvarado-Uribe shares key IFE Data Hub findings from their research; these findings serve as a standard of good practice for the tasks of custody and quality assurance of academic data.
First of all, she establishes the importance of making Tecnológico de Monterrey students aware that the data collected serves a purpose. The information is useful for students, as it aims to improve the quality of their learning and ensure they can continue their studies. For example, one of the results was the value of their involvement in LiFE activities, which unite the student community with cultural, artistic, and sports attractions. The participants highlighted that during their time, they made friends as foreigners, had access to better opportunities, and felt more committed to the university and their environment, motivating them to continue being part of the institution. Beyond a student consulting service, the LiFE program facilitates the analysis and modeling of student data to propose initiatives and co-design solutions that transform student experience.
Scholarships also influence student retention or dropout; their impact depends on both the percentage and type of support received. Notably, being an international student adds a layer of complexity. These students usually find it more difficult to continue their studies during the first year, the stage at which dropout most commonly occurs. It has even been observed that this phenomenon occurs during the registration process in the first and second years, not only during class development.
Alvarado-Uribe points out that it is essential to review and be guided by the privacy notices made available to all users of the institution, which they accept and sign, as they explain the purposes of data collection. Therefore, it is paramount to ensure that student consent has been obtained for the access and use of their information for research purposes, and that the publication of such data is in line with institutional guidelines and metrics.
Determining when this information may be collected is vital. Considering data governance guidelines at both the institutional and global levels helps clarify the legal definition of personal data management and the regulations governing sensitive information, making it easier to determine what information can be provided and how.
In addition, Alvarado-Uribe states that it is essential that the data be anonymized. This practice dissociates the data, avoiding the identification of the people involved, whether they are students or professors. However, the challenge arises when sensitive information is involved; the simplest option would be to delete it, but that would mean losing valuable data. It becomes necessary to define how this information will be presented, for example, by creating new categories or ranges.
Tecnológico de Monterrey’s Techvolution team unifies digital customer service experiences to make them clearer, more agile, and more effective. Information reaches the Data Hub with an anonymized identifier. However, it is essential to ensure that cross-referencing of data does not leak sensitive information. For example, a student from a country with no other records could become identifiable; in these cases, an additional quality protocol must be activated to process the information and avoid any possibility of association.
Although there is no absolute control over who will have final access to the data, the aforementioned measures safeguard and protect it. The team can also follow up on identified actors who are granted access. Defining the purposes for which data will be used facilitates monitoring its use. For this reason, the team designed a profile validation process into a registry and prepared a Data Terms of Use and Conditions document in conjunction with the Legal and Data Governance area. Access is categorically rejected in cases where the application is for business or economic purposes, as the main purpose is academic and research-oriented.
Challenges in the systematization of student information
The IFE Data Hub systematization process applies statistical techniques and exploratory data analysis. These methods assess the quality of information by identifying frequency patterns, detecting missing data and outliers, and reviewing inconsistent data. This analysis is essential for identifying records that fall outside logical ranges, due to capture errors or inconsistencies, such as disproportionate values that do not reflect reality.
Alvarado-Uribe explains that after this first validation, additional collaborative verification is required with the areas responsible for data collection. This step is key to ensuring that the definitions, values, and characteristics of each variable are correctly interpreted and represented. Similarly, integrating data governance adds significant complexity by ensuring consistency and proper management of information at the institutional level.
One of the principal challenges is data heterogeneity, since the information comes from multiple sources and is delivered in various formats, both structured and unstructured. This requires applying specific techniques based on the type of data. For example, the treatment of quantitative variables differs substantially from that of qualitative variables. Added to this is the need to standardize formats, such as dates and numerical values, to avoid inconsistencies that compromise the validity of the analysis.
Alvarado-Uribe describes another relevant challenge: the temporal evolution of the data. Some variables collected during specific periods are later modified, no longer captured, or change format. This requires careful review to identify equivalences, structural changes, or data gaps; continuous communication with generating areas must occur to understand the context of the changes.
Moreover, documentation is vital within this ecosystem. While analysis teams can propose definitions for each variable in the data collection, it is essential to have validation from experts in each area, who have specific knowledge of the data. This is especially relevant when variables exhibit subtle but significant differences that must be clearly distinguished to avoid ambiguity in their interpretation.
On the operational side, data processing includes the generation of descriptive statistics, such as averages, standard deviations, percentages, and cross-tables, as well as graphical visualizations. It also entails cleaning and standardization processes to ensure data homogeneity. These operations determine whether the information can be retrieved or should be classified as “not applicable,” with proper documentation of the reasons.
The use of datasets
The value of the IFE Data Hub’s work is reflected not only in improved data quality but also in its impact on knowledge generation. From these processes, multiple academic investigations have emerged, as well as mechanisms for transferring knowledge to administrative areas through workshops and collaborative spaces where findings and best practices are shared.

Deployments using datasets from the IFE Data Hub have enabled analysis of the impact of institutional educational policies on the student community, recognizing that each context and student profile requires specific, timely interventions. In this framework, studies linked to programs such as the Leaders of Tomorrow initiative stand out, focusing on understanding how high-performing students combine academic excellence with social commitment to identify replicable methodologies that strengthen student retention and well-being.
Likewise, comparative analyses of educational models and longitudinal evaluations have been developed, covering data from admissions to academic performance (competencies). Findings that contribute directly to areas such as Development of Educational Experience and Analytics and Business Intelligence, helping to improve predictive models, identify students at risk of dropping out, and strengthen programs whose effectiveness has been supported (under the Tec21 Educative Model).
Another relevant contribution has been the generation of recommendations spanning research and institutional practice, including optimizations of monitoring instruments, the incorporation of non-cognitive variables (personality traits or motivation), and the reinforcement of comprehensive student accompaniment strategies. These efforts will extend to the graduate level, broadening the scope of the analysis.
Together, these studies reflect a multifactorial approach that recognizes student diversity and uses data to design more precise interventions to increase retention and enrich the educational experience as a whole. Having information over time allows comparisons between models, identification of favorable aspects, and identification of opportunities for them to evolve. From this perspective, data acquires a central role in institutional decision-making. Therefore, reliable, high-quality information supports strategies, reduces uncertainty, and strengthens planning. Proper data management improves educational processes, raises academic quality, and generates benefits for both the institution and its community. For more information about the IFE Data Hub, click here.
Translation by Daniel Wetta
This article from Observatory of the Institute for the Future of Education may be shared under the terms of the license CC BY-NC-SA 4.0 















