AI and the SBO Forums--Part 1

Apr 25, 2024
715
Fuji 32 Bellingham
Some experts think the future of AI is not in Large Language Models like we have now, but in more specialized and limited models. Imagine, for example, having a model trained on Rod Collins’ and Nigel Calder’s writing, without the cruft of also knowing everything that’s been written on forums about boat wiring. Couldn’t answers from something like that be useful for at least some questions that come up here? And if it were trained on a more specific set like that maybe you could avoid some of the other problems the article talked about from the influence of other geographically or stylistically different sources.
I started a long response to this but, as is my tendency, I got into the weeds with a bunch of details most people don't care about. LLMs are a big part of what I do, lately, so I get excited when the subject comes up. Here's my attempt at a shorter version (yes, this is the shorter version):

An LLM is basically a model trained to predict the next token in a sequence of text. Think of giving a computer a pile of sheet music so it learns common note patterns. Then you give it the first part of a song it’s never seen before; it completes the rest based on what’s "most likely" given what it has learned. Music follows patterns, so it’s surprisingly predictable. Human language does too.

It's important to distinguish between LLMs and GPTs. A GPT (such as used by ChatGPT, Gemini, etc.) is a type of LLM. (Here, I am using "GPT" somewhat loosely because most people don't care about the details.) An LLM can be trained on any language corpus. Some LLMs, like a GPT, are trained on a massive and diverse corpus so they can respond better across a variety of topics. So, a GPT is just an LLM with really broad training that isn't domain-specific.

Now to your idea: What if we trained a model only on a handful of really solid boating sources (Calder, etc.) and nothing else?

If you trained a model from scratch only on those books, you'd end up with something that "knows" a bit about boats but is pretty bad at language, in general. To read and understand a book on boating requires you to have read and understood countless other texts, first. Such a model wouldn't have seen enough variety to handle the questions people actually ask, much less make any sense out of what it reads, and it would be very brittle outside the phrasings and patterns in that small corpus. It would behave more like a slightly fancy search function over those books than like a flexible and intelligent assistant.

What we would actually do instead is:
  • first train a big general model on a huge, messy corpus so it gets good at language in general,
  • then specialize it on boating material by either:
    • fine-tuning it on boating texts so it leans toward those, or
    • keeping it general but wiring it to search your boating library and then summarize what it finds (retrieval-augmented generation).
(I actually do this all the time.)

So, a specialized boating assistant might still be a general LLM under the hood, but it's being steered to rely heavily on those trusted boating sources when it constructs its answer.

That still doesn't fix the core issue of "information pollution", though. The problem is that a LLM is optimized to produce a plausible continuation of a conversation, not to admit "I don’t know." So it behaves a bit like that guy who always has an answer, even when he doesn't know. Often useful, sometimes just making confident noise. The difference is that the model doesn't have intentions or an ego. It’s not lying, it’s just pattern-matching in a way that focuses on the wrong patterns or lacks sufficient training to correctly complete the pattern.

The reason tools like ChatGPT are still useful is that the system around the model can be stricter than the model itself. It can, for example, run a web or document search over known-good boating sources, feed those results into the model, and then have the model synthesize an answer with citations. You still need to sanity-check the result when it matters but, as has been noted several times, the same is true of human advice. At least with a LLM, there’s no pride or agenda in the mix.
 
  • Helpful
Likes: Will Gilmore

dLj

.
Mar 23, 2017
4,789
Belliure 41 Back in the Chesapeake
I started a long response to this but, as is my tendency, I got into the weeds with a bunch of details most people don't care about. LLMs are a big part of what I do, lately, so I get excited when the subject comes up. Here's my attempt at a shorter version (yes, this is the shorter version):

An LLM is basically a model trained to predict the next token in a sequence of text. Think of giving a computer a pile of sheet music so it learns common note patterns. Then you give it the first part of a song it’s never seen before; it completes the rest based on what’s "most likely" given what it has learned. Music follows patterns, so it’s surprisingly predictable. Human language does too.

It's important to distinguish between LLMs and GPTs. A GPT (such as used by ChatGPT, Gemini, etc.) is a type of LLM. (Here, I am using "GPT" somewhat loosely because most people don't care about the details.) An LLM can be trained on any language corpus. Some LLMs, like a GPT, are trained on a massive and diverse corpus so they can respond better across a variety of topics. So, a GPT is just an LLM with really broad training that isn't domain-specific.

Now to your idea: What if we trained a model only on a handful of really solid boating sources (Calder, etc.) and nothing else?

If you trained a model from scratch only on those books, you'd end up with something that "knows" a bit about boats but is pretty bad at language, in general. To read and understand a book on boating requires you to have read and understood countless other texts, first. Such a model wouldn't have seen enough variety to handle the questions people actually ask, much less make any sense out of what it reads, and it would be very brittle outside the phrasings and patterns in that small corpus. It would behave more like a slightly fancy search function over those books than like a flexible and intelligent assistant.

What we would actually do instead is:
  • first train a big general model on a huge, messy corpus so it gets good at language in general,
  • then specialize it on boating material by either:
    • fine-tuning it on boating texts so it leans toward those, or
    • keeping it general but wiring it to search your boating library and then summarize what it finds (retrieval-augmented generation).
(I actually do this all the time.)

So, a specialized boating assistant might still be a general LLM under the hood, but it's being steered to rely heavily on those trusted boating sources when it constructs its answer.

That still doesn't fix the core issue of "information pollution", though. The problem is that a LLM is optimized to produce a plausible continuation of a conversation, not to admit "I don’t know." So it behaves a bit like that guy who always has an answer, even when he doesn't know. Often useful, sometimes just making confident noise. The difference is that the model doesn't have intentions or an ego. It’s not lying, it’s just pattern-matching in a way that focuses on the wrong patterns or lacks sufficient training to correctly complete the pattern.

The reason tools like ChatGPT are still useful is that the system around the model can be stricter than the model itself. It can, for example, run a web or document search over known-good boating sources, feed those results into the model, and then have the model synthesize an answer with citations. You still need to sanity-check the result when it matters but, as has been noted several times, the same is true of human advice. At least with a LLM, there’s no pride or agenda in the mix.
There are two fundamental concerns I see in all of the above.

1) Lack of "original thought" through AI and what could easily become squelching of advancements of original thought given the power of repetitively stating the same "truths' over and over.

2) There is an underlying assumption that the knowledge already exists in the "known-good boating sources". In fact, many of the current "known-good sources" are inherently flawed. A simple example is the book "The Boatowner's Guide to Corrosion" by Everett Collier. While that is considered "the best we have" at the moment on this subject, it was written by an electrical engineer that didn't really know corrosion at a "state of the art" level known at the time of it's writing in the field of corrosion. Furthermore, any particular field that has some fundamental reference material associated with it advances over time and those advancements require updating previously held beliefs. I don't see any methodology within AI to support this.

dj
 
Apr 25, 2024
715
Fuji 32 Bellingham
There are two fundamental concerns I see in all of the above.
Neither of those points is really relevant to what I posted. Maybe you are responding the whole of the thread?

I was describing how LLMs work - and whether or not one could train them on a small corpus and still get good results. You are commenting on two other issues - I guess the first being whether or not it would be a good idea to try, in the grand scheme of things, socially. That's a valid concern, but not related to what I wrote.

The second, I guess is kind of relevant, but demonstrates another reason one would not want to train it on a narrow set of texts, but instead on a broader corpus. In a way LLMs democratize "truth", provided they are trained on a sufficiently broad corpus. That is, a fringe opinion can be seen as fringe, when a contrary opinion is more prevalent. Again, it knows nothing about what it true - it only knows what people have said is true. (And, strictly speaking, it doesn't even know that.)

So, if one "trusted" text says one thing, but the overwhelming body of evidence says something else, the LLM is not easily fooled. Humans, on the other hand, buy into things like climate change denial, when the overwhelming consensus is contrary.

LLMs have to be forced to be biased. They are inherently unbiased. Again, they are just machines with one task - to come up with the next token in a sequence. Or, in lay terms, to say what is most likely to be said next, in a conversation. They can only be biased when their training sources are selective or unintentionally biased - or if they are directed to enforce a particular bias.

If you understand them as a tool to summarize a consensus - not as a truth engine, then you're fine. It is a tool that gives you insight into a staggeringly massive amount of written and spoken language - not one that will tell you hard facts, reliably.

The main short-term hazard is that they are mostly correct most of the time and present everything with confident language. Humans are easily fooled by anyone that sounds like they know what they're talking about. We are not well-equipped to deal with people who fabricate with complete confidence and without hesitation. We misinterpret the product and want it to be something it isn't.

Longer term, there is an unknown social cost/benefit calculus - something you partially touch on with concern #1. That is well outside the scope of what we can discuss here, but is something I think a lot about. I can say that most of the things people are concerned about just demonstrate a lack of understanding. But, there are some things that people really should worry about that they simply aren't aware of. I sometimes compare it to hiding from the bogeyman in poison ivy.
 

dLj

.
Mar 23, 2017
4,789
Belliure 41 Back in the Chesapeake
Neither of those points is really relevant to what I posted. Maybe you are responding the whole of the thread?
I felt they were relevant to your post. But I do also try to keep my posts relevant to the entire thread so there is that component.

I was describing how LLMs work - and whether or not one could train them on a small corpus and still get good results. You are commenting on two other issues - I guess the first being whether or not it would be a good idea to try, in the grand scheme of things, socially. That's a valid concern, but not related to what I wrote.
I understand how LLMs work. I've actually built them (well, I've been consulted on how to build them by others that are doing the building) for specific focused applications.

The second, I guess is kind of relevant, but demonstrates another reason one would not want to train it on a narrow set of texts, but instead on a broader corpus. In a way LLMs democratize "truth", provided they are trained on a sufficiently broad corpus. That is, a fringe opinion can be seen as fringe, when a contrary opinion is more prevalent. Again, it knows nothing about what it true - it only knows what people have said is true. (And, strictly speaking, it doesn't even know that.)
You are missing my point - I did not say to use a narrow set of texts, I simply used a specific example showing how it can be erroneous.

You are actually confirming that part of the essence of what I'm addressing. You state "In a way LLMs democratize "truth". What you are pointing out is that these models aggregate broad sets of data and essentially take the most commonly held belief and consider that to be the "right" answer. In cases that may be correct, on others it may not be correct. All original ideas that dispel current incorrectly held beliefs are by definition a "fringe opinion".

So, if one "trusted" text says one thing, but the overwhelming body of evidence says something else, the LLM is not easily fooled. Humans, on the other hand, buy into things like climate change denial, when the overwhelming consensus is contrary.
You say the LLM is not easily fooled I strongly disagree with that. It is neither "fooled" nor "finds the truth". It simply regurgitates the most commonly held beliefs and presents them as "the truth".

LLMs have to be forced to be biased. They are inherently unbiased. Again, they are just machines with one task - to come up with the next token in a sequence. Or, in lay terms, to say what is most likely to be said next, in a conversation. They can only be biased when their training sources are selective or unintentionally biased - or if they are directed to enforce a particular bias.
You state They are inherently unbiased. That is not correct. Just in setting them up, they are created with inherent bias from those that are setting them up. Those that are setting them up have inherent bias and that bias is transferred to the model. Of course they can also be set up to have an even stronger bias - as you talk about above.

If you understand them as a tool to summarize a consensus - not as a truth engine, then you're fine. It is a tool that gives you insight into a staggeringly massive amount of written and spoken language - not one that will tell you hard facts, reliably.
This statement here supports my entire post.

The main short-term hazard is that they are mostly correct most of the time and present everything with confident language. Humans are easily fooled by anyone that sounds like they know what they're talking about. We are not well-equipped to deal with people who fabricate with complete confidence and without hesitation. We misinterpret the product and want it to be something it isn't.
I think your description of the short-term hazard is in itself biased, and the real short-term hazard is actually that you have no idea if it is mostly correct, or mostly incorrect.

Longer term, there is an unknown social cost/benefit calculus - something you partially touch on with concern #1. That is well outside the scope of what we can discuss here, but is something I think a lot about. I can say that most of the things people are concerned about just demonstrate a lack of understanding. But, there are some things that people really should worry about that they simply aren't aware of. I sometimes compare it to hiding from the bogeyman in poison ivy.
I agree we really have no idea of the long term social cost/benefit of this technology. About the only thing we know, is there will be good things that come out of it and not good things that come out of it. Just like any other new technology. But I think this technology has the potential to impact us in more ways than we are even thinking of yet.

That fact is one of the fundamental reasons why in my opinion AI should be always clearly referenced if being used as part of anyone's response.

dj
 
Nov 6, 2020
508
Mariner 36 California
Personally I'm pretty sick and tired of dealing with AI generated content. I find it especially irritating when a question is posed that comes from an AI generated source.

I have no issue with folks searching using AI and then saying something to the nature of "this is what I found through AI" and then copying the info there. Essentially all us humans identify ourselves - the impersonation of humans by AI I find disturbing, and deceitful.

My preference would be that AI should not be allowed to post. If some human is using AI to help identify the answer to a question that's fine - but a strictly AI generated question should be banned - or perhaps as a minimum, be identified as such. If AI is answering a question posted I question if it should be allowed, but at a minimum it should be identified as such.

In summary: I would prefer AI not be allowed to post, but at a minimum it should be identified as being AI. I have no issue with people using AI to help find an answer, but it should be identified as such.

dj
I agree and had a similar thought. Maybe have some bold marking system on a suspected AI generated post/conversation. Maybe change the text color to red or blue or something. Can you use an AI bot to help identify AI posts?
 
Apr 25, 2024
715
Fuji 32 Bellingham
Maybe have some bold marking system on a suspected AI generated post/conversation.
Would such a system flag all content of dubious origin or quality? I think most people are capable of deciding for themselves how credible a post is or how much they know or care about its origin.
Can you use an AI bot to help identify AI posts?
Well, sort of, but there are four problems with this. First, people are generally better at detecting AI-generated content than machines are. This reinforces my assertion that we should let people decide for themselves how to view or treat any post - they are better at evaluating its applicability to their own real-world circumstances. Even a system of voluntary disclosure is problematic. The people who volunteer this information are likely not the ones who would abuse the medium to the detriment of the community. So, it becomes a burden on those with integrity and toothless to those without.

Second, I think that such a proposal is kind of disingenuous. It says that we don't want AI generating potentially helpful or harmful content, but we are perfectly OK with it making evaluations of other mostly human-generated content. I recognize that flagging content as potentially AI-generated is not the same as having AI generate content and representing it as human-generated. But, unless it nearly 100% accurate, then it still comes down to each person deciding for themselves. And, even so, what would such a system actually be doing for us? Is it meant to protect users, somehow?

Which brings us to the third problem I see - that if we get into the practice of systematically making evaluations as some sort of safeguard measure, then where does that stop? As I've said in other posts, I am much more concerned about some of the stuff that (presumably) humans say than what a generative AI system would produce. There are a few members who try to pass themselves off as experts, or at least well-informed, on subjects they demonstrably do not understand. They speak with confidence and meet any disagreement with a defensive backlash. I think this is more toxic than simply copy/pasting some AI content without citation. Neither is great, but at least a machine will not double down on a position that is clearly false, just to win a perceived argument.

The fourth issue is just a problem with technical implementation. SBO is built on XenForo which is an old PHP-based legacy system - almost the software equivalent of a rotary telephone, in terms of being outdated. (PHP is a programming language and the environment upon which such a program runs.) It could be done and PHP is still widely supported and used, but it is very much a withering architecture. Very few new developers have any interest in PHP, and most people with PHP experience moved away from it years ago. (I don't want to overstate its decline. There is still an active PHP community, but the writing is on the wall.)

To be fair, most forum software is PHP-based, but mostly because most of it can trace its roots back to the early 2000s - at the peak of popularity for PHP and shortly before the peak of popularity of this style of forum.

All this to say that while you "could" make such a system to flag suspected AI content, ignoring the fact that it wouldn't be very accurate, it would simply never be made because there isn't really anyone capable of and interested in building it.

I have worked on systems for detecting the use of generative AI to produce academic papers which are presented as one's own work. (Interestingly, it isn't clear that this meets the definition of "plagiarism" for all sorts of nuanced reasons. But, it does generally fall afoul of schools' policies on academic honesty/integrity.) What I (and most people) have found is that there are some reliable hallmarks of AI-generated text but ...

If a person knows what these are, it is trivial to circumvent a detection process. And, we are already seeing AI-generated text shaping the way humans write. For example, we are seeing much more generous use of hyphens to divide a compound sentence. And, we are seeing people starting to use colons and semicolons more, when these had been falling out of favor, in general. This is both kind of disturbing but simultaneously kind of cool.

In other words, you don't have to like it, but there is really nothing to do about it. You are unlikely to ever know one way or the other, nor is it likely to matter in your life one way or the other. What people really object to is the idea of being fooled or even just that possibility. No one is going to sink their boat because AI told them to. They just object to the idea that they might be tricked into interacting ChatGPT, thinking they were interacting with a human.

I get that, and I sympathize with that objection. But, my point is that you will never know one way or the other. So, if you are not OK with that ambiguity, anonymous online interactions are not for you, in 2025. That sounds harsh and isn't meant to push people away. It is just the stark reality of the situation. You almost never know who is on the other end of the conversation, where their views and information come from, or what their agenda is. You have to make your peace with, with or without AI.
 

Dave

Forum Admin, Gen II
Staff member
Feb 1, 2023
111
I agree and had a similar thought. Maybe have some bold marking system on a suspected AI generated post/conversation. Maybe change the text color to red or blue or something. Can you use an AI bot to help identify AI posts?
Unfortunately the accuracy of programs to detect AI generated text is no sufficient to flag posts. Using such a program would invite conflict between someone posting suspicious text and the person flagging it. No one benefits from that kind of conflict.

The issue is honesty, presenting an AI generated answer to a post as your own, meets the definition of plagiarism. Using AI to edit your own material teaches you nothing about how to write effectively. In either case, it does not accurately represent who the poster is, it is not authentic. The SBO community values authenticity.

Wikipedia defines plagiarism as:

Plagiarism is the representation of another person's language, thoughts, ideas, or expressions as one's own original work.[1][2][3]Although precise definitions vary depending on the institution,[4] in many countries and cultures plagiarism is considered a violation of academic integrity and journalistic ethics, as well as of social norms around learning, teaching, research, fairness, respect, and responsibility.[5] As such, a person or entity that is determined to have committed plagiarism is often subject to various punishments or sanctions, such as suspension, expulsion from school[6] or work,[7] fines,[8][9] imprisonment,[10][11] and other penalties.
 
May 17, 2004
5,904
Beneteau Oceanis 37 Havre de Grace
Unfortunately the accuracy of programs to detect AI generated text is no sufficient to flag posts. Using such a program would invite conflict between someone posting suspicious text and the person flagging it. No one benefits from that kind of conflict.

The issue is honesty, presenting an AI generated answer to a post as your own, meets the definition of plagiarism. Using AI to edit your own material teaches you nothing about how to write effectively. In either case, it does not accurately represent who the poster is, it is not authentic. The SBO community values authenticity.

Wikipedia defines plagiarism as:
At the risk of going off-topic, I’m not sure AI use falls under that definition of plagiarism. AI is a statistical model of words (more precisely tokens, but words is an adequate approximation) that most commonly follow other words, based on its training data. A user creates a prompt that leads to the generation of a transformative response leveraging that statistical model, with some “temperature” of randomness thrown in. There is no one “person” whose thoughts, ideas, or expressions are being passed off directly. To strain an analogy - if someone asks the hull speed of a boat with a 30 foot waterline, I might reply that it’s 1.34 x sqrt(30). I might go so far as to use my calculator (without attribution) and say that equals 7.3 knots. Did I plagiarize by giving a calculator an input (a prompt) and then expressing the machine’s output as if it were my own? What if rather than using a calculator I went to ChatGPT and asked “What is 1.34 times the square root of 30?” Do different rules apply because I used one machine instead of another?

This is not to say that AI use is not a violation of academic integrity in environments where that applies. Academic institutions themselves have defined AI use as a violation, as is their prerogative to do. But academic integrity rules may be a superset of plagiarism and not limited by its strict definition. I also don’t mean to say that passing AI narratives off as your own on the forum should be acceptable. Clearly we have a culture that appreciates authenticity, and at least some uses of AI content can violate that culture.
 
Apr 25, 2024
715
Fuji 32 Bellingham
David is right on this. It "feels" like plagiarism but fails to meet that definition, on a technicality. (To be clear, there is no one definition of plagiarism that is accepted by all.) This is surely splitting hairs, in the context of this discussion. The issue seems to be more that many people simply don't like the idea - regardless of whether or not it meets some specific definition.

I will point out that it is perfectly legal to have ChatGPT write a novel, then publish that novel under one's own name with no credit given to ChatGPT. We, as a society, generally don't like that idea, but it isn't illegal. The key element that is missing is a harmed party. In general, copyright laws exist to protect the interests of the original author. They are not there to protect the public. In the case of AI-generated text, there is no harmed party, in any sense that our legal system is prepared to redress.

In academic settings, plagiarism policies exist also to protect the interests of the original author but with the dual and equal purpose to prevent the plagiarising party from gaining unfair advantage by using someone else's work in this way.

AI usage doesn't touch any of these issues. Nothing is stolen. The owners of the system that generated the text don't claim any copyright interests. It is still not well-settled, legally, but in general it is being accepted that the person who prompted the AI system to generate the text is the copyright owner. This is why a person can publish a book written by ChatGPT and subsequently protect that copyright.

I know this seems crazy, but it illustrates how the technology has outpaced our mechanisms for dealing with it. We don't have laws, policies, or social norms to adequately deal with issues that are completely new in human history.

I think the important consideration is that this entire discussion is an emotional response to the issue. It seems to have been settled and agreed upon, early in the discussion, that this isn't about the content itself. Everyone agrees that each person is responsible for vetting any information from any source. There is no material issue here. People just don't like it.

That's OK, and perfectly human, but I do think it is important to view the matter clearly rather than cloud it by using terms like "plagiarism" which strongly imply a harmed party.

Or ... I don't know ... maybe someone can describe how this issue causes damage to the community.

I don't think it is productive to make vague statements like, "It undermines trust", because the community was very clear that we mostly agree that any post should be independently verified. The issue of trust only holds water if someone could prove that AI-generated content, on average, was less trustworthy than human-generated content, on average.

The only thing I have really seen are statements like "the community expects" or "I don't want". Perfectly valid, but not enforceable and not even reliably detectable. So then, what is the point? Is this just rage against the machine?

So, if this can't be prevented and, if we accept what I said earlier - that voluntary compliance targets the wrong people and content - then rather than shouting "get off my lawn", it might be more productive if people can articulate what very specific problems this causes. If we could more specifically define the objections, there might actually be solutions that address those issues.

Can anyone give a concrete example of a way that an AI-generated (or AI-assisted) post causes an actual problem - something that can be described in material-enough terms that it becomes a solvable problem and not just a nebulous objection?

I would like to head off any example that says that AI can give incorrect information which could adversely influence a member who fails to adequately validate that information. I think everyone agrees that this isn't an AI-specific problem. (I would argue that there is actually lower risk of bad advice with an AI-generated or AI-assisted post - but that's pretty hard to defend, objectively.)

Can anyone articulate an issue of substance that doesn't fall into that pattern?
 

Dave

Forum Admin, Gen II
Staff member
Feb 1, 2023
111
At the risk of going off-topic, I’m not sure AI use falls under that definition of plagiarism. AI is a statistical model of words (more precisely tokens, but words is an adequate approximation) that most commonly follow other words, based on its training data. A user creates a prompt that leads to the generation of a transformative response leveraging that statistical model, with some “temperature” of randomness thrown in. There is no one “person” whose thoughts, ideas, or expressions are being passed off directly. To strain an analogy - if someone asks the hull speed of a boat with a 30 foot waterline, I might reply that it’s 1.34 x sqrt(30). I might go so far as to use my calculator (without attribution) and say that equals 7.3 knots. Did I plagiarize by giving a calculator an input (a prompt) and then expressing the machine’s output as if it were my own? What if rather than using a calculator I went to ChatGPT and asked “What is 1.34 times the square root of 30?” Do different rules apply because I used one machine instead of another?

This is not to say that AI use is not a violation of academic integrity in environments where that applies. Academic institutions themselves have defined AI use as a violation, as is their prerogative to do. But academic integrity rules may be a superset of plagiarism and not limited by its strict definition. I also don’t mean to say that passing AI narratives off as your own on the forum should be acceptable. Clearly we have a culture that appreciates authenticity, and at least some uses of AI content can violate that culture.
One theme running through this discussion has been "what is AI generated content?" Your comment gets to that issue, so it is not off-topic.

The example you cite, calculating theoretical hull speed, is well within the public domain. The formula is readily available in various references online and off. Using ChatGPT to find the theoretical hull speed of a boat or to find the formula is really no different from asking Wikipedia or consulting a yacht design text. No need for a citation, any one with basic math skills can do the calculation, nothing original is created. (They do teach how to solve square roots by successive approximations still, don't they?)

Let's ask the hull speed question a little differently. What are the factors affecting hull speed and hull speed calculations for a boat with a waterline of X feet? While this information is available, it has not come into the public domain, there is debate about how to calculate hull speed and hull characteristics which affect hull speed. If the AI response is posted without attribution, then it is plagiarism, the work of someone (ChatGPT for example) is presented as one's own work, when all the poster did was ask the question. The poster has taken credit for work he hasn't done.

The edges of AI acceptability are necessarily fuzzy. It would be an onerous task to try to define all conditions and parameters of acceptability in a functional manner. That's a rabbit hole I don't want to go down. And that's why we should be guided by general principles and not specific rules. Some will take advantage of the ambiguity for their own personal reasons, which says some thing about their integrity and character. And some will bristle at the ambiguity and demand firm rules and guidelines, such is life, it is often gray and ambiguous. Yes, I am appealing to our better angels.
 
Feb 6, 1998
11,723
Canadian Sailcraft 36T Casco Bay, ME
I own multiple private boating groups on Facebook with over 250K members. If we catch you using AI to generate response you get the ban stick. We have also turned off all AI Features. My take is, AI bots should not be allowed here, humans only...
 
Nov 6, 2020
508
Mariner 36 California
Would such a system flag all content of dubious origin or quality? I think most people are capable of deciding for themselves how credible a post is or how much they know or care about its origin.

Well, sort of, but there are four problems with this. First, people are generally better at detecting AI-generated content than machines are. This reinforces my assertion that we should let people decide for themselves how to view or treat any post - they are better at evaluating its applicability to their own real-world circumstances. Even a system of voluntary disclosure is problematic. The people who volunteer this information are likely not the ones who would abuse the medium to the detriment of the community. So, it becomes a burden on those with integrity and toothless to those without.

Second, I think that such a proposal is kind of disingenuous. It says that we don't want AI generating potentially helpful or harmful content, but we are perfectly OK with it making evaluations of other mostly human-generated content. I recognize that flagging content as potentially AI-generated is not the same as having AI generate content and representing it as human-generated. But, unless it nearly 100% accurate, then it still comes down to each person deciding for themselves. And, even so, what would such a system actually be doing for us? Is it meant to protect users, somehow?

Which brings us to the third problem I see - that if we get into the practice of systematically making evaluations as some sort of safeguard measure, then where does that stop? As I've said in other posts, I am much more concerned about some of the stuff that (presumably) humans say than what a generative AI system would produce. There are a few members who try to pass themselves off as experts, or at least well-informed, on subjects they demonstrably do not understand. They speak with confidence and meet any disagreement with a defensive backlash. I think this is more toxic than simply copy/pasting some AI content without citation. Neither is great, but at least a machine will not double down on a position that is clearly false, just to win a perceived argument.

The fourth issue is just a problem with technical implementation. SBO is built on XenForo which is an old PHP-based legacy system - almost the software equivalent of a rotary telephone, in terms of being outdated. (PHP is a programming language and the environment upon which such a program runs.) It could be done and PHP is still widely supported and used, but it is very much a withering architecture. Very few new developers have any interest in PHP, and most people with PHP experience moved away from it years ago. (I don't want to overstate its decline. There is still an active PHP community, but the writing is on the wall.)

To be fair, most forum software is PHP-based, but mostly because most of it can trace its roots back to the early 2000s - at the peak of popularity for PHP and shortly before the peak of popularity of this style of forum.

All this to say that while you "could" make such a system to flag suspected AI content, ignoring the fact that it wouldn't be very accurate, it would simply never be made because there isn't really anyone capable of and interested in building it.

I have worked on systems for detecting the use of generative AI to produce academic papers which are presented as one's own work. (Interestingly, it isn't clear that this meets the definition of "plagiarism" for all sorts of nuanced reasons. But, it does generally fall afoul of schools' policies on academic honesty/integrity.) What I (and most people) have found is that there are some reliable hallmarks of AI-generated text but ...

If a person knows what these are, it is trivial to circumvent a detection process. And, we are already seeing AI-generated text shaping the way humans write. For example, we are seeing much more generous use of hyphens to divide a compound sentence. And, we are seeing people starting to use colons and semicolons more, when these had been falling out of favor, in general. This is both kind of disturbing but simultaneously kind of cool.

In other words, you don't have to like it, but there is really nothing to do about it. You are unlikely to ever know one way or the other, nor is it likely to matter in your life one way or the other. What people really object to is the idea of being fooled or even just that possibility. No one is going to sink their boat because AI told them to. They just object to the idea that they might be tricked into interacting ChatGPT, thinking they were interacting with a human.

I get that, and I sympathize with that objection. But, my point is that you will never know one way or the other. So, if you are not OK with that ambiguity, anonymous online interactions are not for you, in 2025. That sounds harsh and isn't meant to push people away. It is just the stark reality of the situation. You almost never know who is on the other end of the conversation, where their views and information come from, or what their agenda is. You have to make your peace with, with or without AI.
You make some interesting points. Thanks for elaborating.

To be clear, I dont really mind people using AI to help them in replying to posts, or using it to even create a post. If you think about it, AI is simply using information that we could find and post on our own from Googling, its just lightyears faster. I love it and use it all the time now myself.

I think what irks me though, is AI being used to create posts when the intention clearly is to train the software or for some means of making money. My biggest fear with AI on forums is that it will become used as a sneaky and crafty form of advertising that will be so subtle, no one will realize whats happening. That it will be trained to use and shape a conversation to an ultimate end of suggesting a particular product or service. You could end up with a multi page post for example, about the best way to clean a deck. AI could be trained and used to carry on a lengthy back and forth conversation disguising its true intention, and ultimately suggest 'product X' for deck cleaning. When this stage of AI happens, and it will, how authentic will future forum conversations become? I think internet forums in particular are especially vulnerable because they are an incredibly vast un-tapped source of revenue.

I dont know the solution, or if there will even be one available, or if there is anything we could even do about it now.
 

Dave

Forum Admin, Gen II
Staff member
Feb 1, 2023
111
David is right on this. It "feels" like plagiarism but fails to meet that definition, on a technicality. (To be clear, there is no one definition of plagiarism that is accepted by all.) This is surely splitting hairs, in the context of this discussion. The issue seems to be more that many people simply don't like the idea - regardless of whether or not it meets some specific definition.

I will point out that it is perfectly legal to have ChatGPT write a novel, then publish that novel under one's own name with no credit given to ChatGPT. We, as a society, generally don't like that idea, but it isn't illegal. The key element that is missing is a harmed party. In general, copyright laws exist to protect the interests of the original author. They are not there to protect the public. In the case of AI-generated text, there is no harmed party, in any sense that our legal system is prepared to redress.

In academic settings, plagiarism policies exist also to protect the interests of the original author but with the dual and equal purpose to prevent the plagiarising party from gaining unfair advantage by using someone else's work in this way.

AI usage doesn't touch any of these issues. Nothing is stolen. The owners of the system that generated the text don't claim any copyright interests. It is still not well-settled, legally, but in general it is being accepted that the person who prompted the AI system to generate the text is the copyright owner. This is why a person can publish a book written by ChatGPT and subsequently protect that copyright.

I know this seems crazy, but it illustrates how the technology has outpaced our mechanisms for dealing with it. We don't have laws, policies, or social norms to adequately deal with issues that are completely new in human history.

I think the important consideration is that this entire discussion is an emotional response to the issue. It seems to have been settled and agreed upon, early in the discussion, that this isn't about the content itself. Everyone agrees that each person is responsible for vetting any information from any source. There is no material issue here. People just don't like it.

That's OK, and perfectly human, but I do think it is important to view the matter clearly rather than cloud it by using terms like "plagiarism" which strongly imply a harmed party.

Or ... I don't know ... maybe someone can describe how this issue causes damage to the community.

I don't think it is productive to make vague statements like, "It undermines trust", because the community was very clear that we mostly agree that any post should be independently verified. The issue of trust only holds water if someone could prove that AI-generated content, on average, was less trustworthy than human-generated content, on average.

The only thing I have really seen are statements like "the community expects" or "I don't want". Perfectly valid, but not enforceable and not even reliably detectable. So then, what is the point? Is this just rage against the machine?

So, if this can't be prevented and, if we accept what I said earlier - that voluntary compliance targets the wrong people and content - then rather than shouting "get off my lawn", it might be more productive if people can articulate what very specific problems this causes. If we could more specifically define the objections, there might actually be solutions that address those issues.

Can anyone give a concrete example of a way that an AI-generated (or AI-assisted) post causes an actual problem - something that can be described in material-enough terms that it becomes a solvable problem and not just a nebulous objection?

I would like to head off any example that says that AI can give incorrect information which could adversely influence a member who fails to adequately validate that information. I think everyone agrees that this isn't an AI-specific problem. (I would argue that there is actually lower risk of bad advice with an AI-generated or AI-assisted post - but that's pretty hard to defend, objectively.)

Can anyone articulate an issue of substance that doesn't fall into that pattern?
You are making a mountain out of a mole hill. The expectation is a post using AI should cite its source as being AI. And to be honest any information that is for the most part not common knowledge should cite the source.

Here's a thread that demonstrates how an AI clip was cited and integrated into a thread. Because it was cited, we know the poster is not just making up data. Following this example is not onerous.

Which Prop for Bristol 45.5? (see post #3)
 
Feb 26, 2004
23,149
Catalina 34 224 Maple Bay, BC, Canada
or using it to even create a post.
This is one I have no idea of how to deal with. How and why would anyone use AI to create a question? If you can't ask the question "yourself", how the heck can AI do it for you?

A couple of examples would sure go a long way to help me understand this possibility.
 
Last edited:

Dave

Forum Admin, Gen II
Staff member
Feb 1, 2023
111
To be fair, I'm not making anything out of anything. You raised the topic before there was any evidence a problem existed.
That is an incorrect assumption. I broached the topic because members had expressed concerns about posts that appeared to have been generated by AI and one in particular that gave bad advice. I thought it wise to get out in front of this before it became a problem.
 

jssailem

SBO Weather and Forecasting Forum Jim & John
Oct 22, 2014
23,814
CAL 35 Cruiser #21 moored EVERETT WA
DAVE,

In the spirit of transparency, I did not understand issue as you have just stated. It sounded much to me like this was an action you initiated on your own without prompts from the general membership.

“In recent weeks I have noticed an increase in what appear to be responses generated by AI in the forums. Some of the posts have been identified as coming from an AI platform, others seem thinly disguised. Likewise, some of the AI responses seem fairly accurate, others contain inaccurate or less than ideal answers.

My question is simple, how should the SBO community handle posts generated by AI?

Thank you for your thoughts.”

I can understand how the assumption occurred be it incorrect or factual.