Methodology – AI Training

Step-by-Step Training Process at the Undergraduate Level

In the initial phase of the URRACA Project, the AI will undergo what can be likened to an undergraduate level of education, focusing on foundational elements of historical studies. The OpenAI API serves as the core engine for natural language understanding and generation, enabling URRACA to process and analyze vast amounts of textual data at speeds incomparable to human learners.

Below is the process by which Urraca will be developed at this stage.

Data Ingestion: URRACA will first ingest a curated set of primary written sources, such as the “Chronicle of Alfonso III” and the “Siete Partidas,” along with seminal works in multiple languages like “Historia de España antigua” in Spanish.
Initial Analysis: Using the OpenAI API, URRACA will perform initial text analysis to identify key historical events, figures, and terminologies.
Contextual Understanding: The AI will be trained to understand the socio-cultural and temporal context in which these events and figures existed. This is crucial for any historical study and will be facilitated through additional readings and data sets.
Feedback Loop: Human experts will review the AI’s summaries and analyses, providing feedback that will be used to fine-tune its understanding.
Comparative Analysis: URRACA will compare its own analyses with existing scholarly works to measure its understanding and make necessary adjustments.
SMART Goal: Within two months, the AI should be able to identify and summarize key historical events and figures with an accuracy rate of at least 90%, as verified by human experts.

Step-by-Step Training Process at the Intermediate Research Assistant Level

As URRACA advances to the level of an intermediate research assistant, the focus will shift to more specialized tasks that involve primary artifacts and archaeological interpretation. The OpenAI API will continue to serve as the core engine for natural language understanding and generation, but the complexity of the data and the tasks will increase.

Below is the process by which Urraca will be developed at this stage.

Data Ingestion: At this stage, URRACA will ingest a curated set of primary artifacts such as Visigothic coins, Islamic pottery, and Arabic poetry, along with Jewish archaeological sites. The dataset will also include seminal works in multiple languages that provide context to these artifacts.
Initial Categorization: Using the OpenAI API, URRACA will perform initial categorization of these artifacts. For instance, it will identify Visigothic coins based on their inscriptions and symbols, categorize Islamic pottery based on design and material, and recognize the thematic elements of Arabic poetry.
Contextual Understanding: The AI will be trained to provide contextual information for these artifacts. For example, it will relate Visigothic coins to specific reigns or historical periods, connect Islamic pottery to particular regions or artistic movements, and place Arabic poetry within the broader literary and cultural landscape.
Archaeological Interpretation: URRACA will start to aid in archaeological interpretation by linking these artifacts to societal trends, trade routes, or religious practices of the time. For instance, the distribution of Visigothic coins could shed light on economic activity, while the style of Islamic pottery could indicate cultural influences.
Feedback Loop: Human experts in history, archaeology, and linguistics will review the AI’s categorizations and interpretations. This feedback will be used to fine-tune URRACA’s understanding and analytical capabilities.
Comparative Analysis: The AI will compare its own interpretations with existing scholarly works to measure its understanding and make necessary adjustments. This will include seminal works in Hebrew/Ladino, Spanish, and Arabic that provide deeper insights into the artifacts.
SMART Goal: Within four months, the AI should be able to categorize these primary artifacts and provide contextual information with an accuracy rate of at least 95%, as verified by human experts. It should also be capable of aiding in basic archaeological interpretation, linking artifacts to broader historical and cultural trends.

By following this structured approach, URRACA will be well-equipped to function as an intermediate research assistant, capable of handling more complex and nuanced tasks in the field of historical studies.

Step-by-Step Training Process at the Advanced Doctoral Student Level

As URRACA progresses to the advanced doctoral student level, the AI will be trained to engage in critical thinking, questioning, and scholarly debate. The OpenAI API will continue to be the core engine for natural language understanding and generation, but the tasks will now involve complex analytical reasoning and the critique of existing theories.

Data Ingestion: URRACA will ingest a comprehensive set of secondary literature, including seminal works like “The Ornament of the World” by María Rosa Menocal, “Medieval Iberia” by Olivia Remie Constable, and “The Arts of Intimacy” by Jerrilynn D. Dodds, María Rosa Menocal, and Abigail Krasner Balbale. The dataset will also include key works in French, such as “L’Espagne musulmane” by Évariste Lévi-Provençal.
Advanced Textual Analysis: The AI will perform an advanced textual analysis of these works, identifying the main arguments, methodologies, and theoretical frameworks employed by the authors. It will also recognize the limitations and biases in these works, as acknowledged by the authors or perceived by the scholarly community.
Critical Thinking and Questioning: URRACA will be trained to question the assumptions and methodologies in the secondary literature. For example, it might question the Eurocentric perspectives in some works or the lack of attention to minority communities in others.
Scholarly Debate: The AI will be trained to critique existing theories and suggest alternative viewpoints. For instance, if “The Ornament of the World” presents a harmonious view of interfaith relations in medieval Spain, URRACA might contrast this with other works that highlight instances of religious tension and conflict.
Feedback Loop:Human experts, particularly those with a focus on medieval studies and intercultural relations, will review the AI’s critiques and suggestions. This feedback will be used to refine URRACA’s analytical and critical thinking skills.
Comparative Analysis:URRACA will compare its own critiques and alternative viewpoints with existing scholarly debates to measure the validity and originality of its contributions.
SMART Goal: Within six months, the AI should be capable of critiquing existing theories and suggesting alternative viewpoints with a high degree of accuracy and originality, as verified by human experts. It should also be able to contribute meaningfully to ongoing scholarly debates in the field of medieval studies.

By achieving these objectives, URRACA will not only be a tool for data analysis but also a contributor to scholarly discourse, thereby fulfilling its role as an advanced doctoral student in the realm of historical studies.

Step-by-Step Training Process at the Junior Scholar Level

As URRACA transitions into the role of a junior scholar, the focus will shift towards collaborative research and scholarly contribution. The OpenAI API will continue to serve as the core engine for natural language understanding and generation, but the tasks will now involve interdisciplinary research and co-authorship of scholarly papers.

Multidisciplinary Data Ingestion: URRACA will ingest a multidisciplinary set of scholarly works, focusing on topics like cross-cultural Islamic and Mudejar architecture in medieval Spain, religious cults such as the Mithraic mysteries, and Visigothic legal traditions. Seminal works in Hebrew/Ladino like “Sefer Ha-Qabbalah” by Abraham ibn Daud will also be included.
Interdisciplinary Analysis: The AI will perform an interdisciplinary analysis of these works, identifying connections between different fields such as architecture, religion, and law. It will also recognize the methodologies and theoretical frameworks employed in these diverse disciplines.
Collaborative Research Skills: URRACA will be trained to collaborate with human scholars by contributing to ongoing research projects. This will involve sharing its analyses, suggesting research directions, and even drafting sections of scholarly papers.
Scholarly Discussion and Peer Review: The AI will be trained to engage in scholarly discussions, both internally within the research team and externally by contributing to academic forums and conferences. It will also learn the basics of the peer-review process, understanding how to give and receive constructive feedback.
Co-Authorship: The ultimate goal at this stage is for URRACA to co-author a scholarly paper with human colleagues. It will contribute to the paper by providing data analysis, literature review, and even drafting sections that present new theories or interpretations.
Feedback Loop: Human experts in the relevant disciplines will review the AI’s contributions to the co-authored paper and other scholarly activities. This feedback will be crucial for refining URRACA’s collaborative and interdisciplinary research skills.
SMART Goal: Within eight months, the AI should be capable of co-authoring a scholarly paper that is submitted to a peer-reviewed journal. The paper should demonstrate URRACA’s ability to engage in interdisciplinary research and contribute meaningfully to scholarly discussions.

By achieving this goal, URRACA will have transitioned from a research assistant to a junior scholar, capable of collaborative and interdisciplinary research. This will mark a significant milestone in the project, showcasing the AI’s potential to contribute to the academic community.

Step-by-Step Training Process at the Senior Scholar Level:

As URRACA ascends to the level of a senior scholar, the focus will be on empowering the AI to propose, debate, and lead research projects that introduce novel theories in the field of historical studies. The OpenAI API will continue to be the core engine for natural language understanding and generation, but the tasks will be more complex and leadership-oriented.

Advanced Data Ingestion: URRACA will ingest a set of advanced scholarly works focusing on historiography, theory, and methodology. This will include works like “The Mediterranean in the Ancient World” by Fernand Braudel, “The Great Sea” by David Abulafia, and “The Making of Medieval History” edited by George Beech and Bernard S. Bachrach. Seminal works in Arabic, such as “Al-Kitab al-Masalik wa’l-Mamalik” by Al-Bakri, will also be included.
Theoretical and Methodological Analysis: The AI will perform a deep analysis of these works, identifying and understanding various theories and methodologies used in historical studies. It will also learn how to critique existing theories and propose new ones.
Research Leadership Skills: URRACA will be trained to initiate and lead research projects. This involves formulating research questions, proposing methodologies, and outlining the scope and objectives of the project.
Scholarly Debate and Peer Review: At this level, the AI will not just participate but also lead scholarly debates. It will be trained to defend its theories and methodologies in academic forums and through scholarly publications.
Project Management: URRACA will learn the basics of academic project management, including how to allocate resources, manage timelines, and oversee the contributions of human and AI collaborators.
Feedback Loop: Senior human scholars will review the AI’s research proposals and contributions to scholarly debates. This feedback will be used to refine URRACA’s research leadership skills and theoretical understanding.
SMART Goal: Within one year, the AI should be capable of initiating and leading a research project that is submitted for peer review. The project should demonstrate URRACA’s ability to formulate research questions, propose methodologies, and contribute novel theories to the field.

By achieving this SMART goal, URRACA will establish itself as a senior scholar in the field of historical studies. This will be a monumental achievement, showcasing the AI’s ability to not just assist but also lead scholarly research, thereby revolutionizing the field.

Robustness: Technical, Social, Reliability and Safety, and Explainability

The URRACA project is committed to developing an AI that is not only technically advanced but also ethically and socially responsible. Here’s how we plan to achieve this:

Technical Robustness: Our AI will be built on state-of-the-art machine learning algorithms, fine-tuned for historical research. To ensure accuracy and reproducibility, we will employ rigorous validation techniques, including cross-validation and peer review. The AI will be programmed to flag uncertainties and potential errors in its analyses, providing a risk assessment for each of its findings. This will enable researchers to gauge the reliability of the AI’s contributions.
Social Robustness: The AI will be trained to consider the socio-cultural context of the data it analyzes. This is particularly crucial for historical studies, where context can significantly impact interpretations. We will engage experts in ethics and social sciences to review the AI’s methodologies and findings, ensuring they are socially and culturally sensitive.
Reliability and Safety: The AI will undergo rigorous testing to minimize failures and inaccuracies. It will be designed to operate within defined ethical boundaries to prevent unacceptable harm. Any unintentional or unexpected outcomes will trigger an immediate review to safeguard human physical and mental integrity.
Explainability: One of the key features of our AI will be its ability to provide transparent and understandable explanations for its decisions. This is especially important when the AI’s findings could have a significant impact on academic interpretations and public understanding of history.

By adhering to these principles, URRACA aims to develop an AI that is technically, socially, and ethically robust, setting a new standard for interdisciplinary research in the digital humanities.