SQLGOD
Documenting my journey as a graduate student working with SQL & trying to publish a paper on its error messages
Fall 2025
August/September
Week 1 (August 25th - 31st)
- Class orientations (CS489: Deep Learning, CS682: Artifical Intelligence, CS784: Scheduling)
- Caught up with professors that I haven't spoken to since the Spring 2025 semester
- Attended Dr.Stefik's kickoff meeting with his research team
Week 2 (September 1st - 7th)
- Met with Dr.Stefik to address the steps necessary for the SQL research paper to be formally conducted under IRB approval and the reality of the paper's potential for publishing
- Met with Dr.Cisneros to seek advice regarding the progression of my Masters degree. Learned that average students do not know what their research topic is until their second year of Masters and that first year is heavily class loaded.
- Scheduled to meet with Dr.Stefik's research team next week to discuss the methodology of the experiment and how it can be modified before filing for IRB exemption
Week 3 (September 8th - 14th)
- I began a literature review to see if there were any papers published on the topic of SQL in the past three months. (As it has been three months since my last literature view). While searching, I discovered that Toni Taipalus had recently published the paper "Enhanced SQL error messages facilitate faster error fixing". After skimming the paper, it was apparent his experiment was designed in an eeriely similar way to that of my prototype study had previously conducted 3+ months ago. After discussing this with Stefik, in his own words, I was "scooped". On one hand, its unfortunate that my approach has been taken so my paper wouldn't add something totally new to the literature, however, this does mean that my approach and thought process is aligned with that of a well decorated PhD, not bad. To continue this journey without making it a replication study, as the aim is to one day publish this work, my goal has now shifted towards targeting the errors of aggregate functions in SQL, rather than general errors of SQL.
Week 4 (September 15th - 21st)
- Met with Dr.Nasoz to catch up and gather their opinion on the direction of my study. They emphasized that I should focus on demographics and changing the queries entirely from what I had previously used in my prototype. Additionally, they warned of making the study too complex as that would narrow the number of viable participants.
- Learned that I need to go through Collaborative Institutional Training Initative (CITI) before submitting to IRB as it is required before approval is granted.
Week 5 (September 22nd - 28th)
- Contacted Toni Taipalus via e-mail to establish a line of communication, express my appreciation and admiration for his work, and ask for suggestions regarding my own desire to research SQL in education. Toni is a Finnish researcher who has published numerous papers regarding SQL and how novices interact with it. After discussion, Toni was very pleased with my proposed study idea and willing to help however possible.
- Demographics and aggregate functions were not considered in any of Toni's work and may serve as my opportunity to bring something new to the publishing space
- Dr.Stefik suggested working with a friend of his from Germany who has previously published a paper on SQL, "An Empirical Study on the Possible Positive Effect of Imperative Constructs in Declarative Languages: The Case with SQL"
- My todo list consists of addressing what my target audience for demographics will be (heavily consider who is and isn't a viable candidate), determing what aggregate functions I want to use, determining what functions were used by Toni in his paper, and read the suggested paper
October
Week 6 (September 29th - October 5th)
- No work was done in regards to research as this week has been busy with class work deadlines. The todo list of last week still needs to be addressed.
Week 7 (October 6th - October 12th)
- In preparation for midterms, research work has yet again been neglected for the most part of this week.
- Target demographic will be the following:
Undergraduate Computer Science students that have completed CS326, Programming Languages
Gradute CS students without any prerequistes
Recent CS graduates, defined as being students who have graduated in the past 12 months.
- Any aggregate functions that are chosen need to be straightforward to understand. Wtih the target demographic, it is important that the study doesn't expect too much of them given that they're novices in the field.
Week 8 (October 13th - October 19th)
- A zoom meeting was held with researchers whose names are redacted. The purpose for this meeting was to address the work they're currently doing with SQL and to get a sense of direction for my own study. It was suggested to use N-of-1 studies to help quickly determine potential avenues regarding SQL error messages and syntax. More specifically, it was suggested to create a N-of-1 study that measures whether or not the new SQL syntax introduced by Google actually has any validity to it amongst novice programmers.
- Google's paper "SQL Has Problems. We Can Fix Them: Pipe Syntax in SQL" I found to be one of little validity and more of self-promotion. Their evidence cites their own employees using a version of SQL they developed and how as time goes on, and as they continue to push people to use it, more people preferred their approach over traditional SQL syntax. The real difference between the two is the use of "|>" before every line and changing around the order in which queries are written.
Week 9 (October 20th - October 26th)
- Dr.Stefik proposed the idea of creating an N-of-1 study that randomly generates tables and a corresponding query that consits of nested COUNTs, requiring the user to read and comprehend the query, then provide an answer between 0-9. This approach aims to narrow down purely syntax readability and measure whether or not theres a meaningful difference across multiple formats. My responsibility now is to use existing resources and create a simple program that can do just this.
November
Week 10 (October 27th - November 2nd)
- Last midterm of the semester was this week. That was a nightmare but now hopefully I'll be able to begin dedicating more time towardas research as the month of October comes to a close.
- Began experimenting on how to create the experimental environment as described in last week's update.
Week 11 (November 3rd - November 9th)
- Absolutely nothing, CS689 is taking up all of my time.
Week 12 (November 10th - November 16th)
- Absolutely nothing, all of my classes are eating up my time. At this point, I do not anticipate any work to be done until winter break, but only time will tell.