Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT

misk@sopuli.xyz · 2 months ago

Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT

Rooki@lemmy.world · 2 months ago

If this is true, then we should prepare to be shout at by chatgpt why we didnt knew already that simple error.

snekerpimp@lemmy.world · 2 months ago

ChatGPT now just says “read the docs!” To every question

Dave@lemmy.nz · 2 months ago

Hey ChatGPT, how can I …

“Locking as this is a duplicate of [unrelated question]”

Ekky@sopuli.xyz · 2 months ago

And then links to a similar sounding but ultimately totally unrelated site.

elvith@feddit.de · 2 months ago

Nah, it just marks your question as duplicate.

angelsomething@lemmy.one · 2 months ago

Already had that happen with perplexity, like, no mate, I’m asking you.

catloaf@lemm.ee · 2 months ago

Honestly, that wouldn’t be the worst thing in the world.

NuXCOM_90Percent@lemmy.zip · 2 months ago

You joke.

This would have been probably early last year? Had to look up how to do something in fortran (because fortran) and the answer was very much in the voice of that one dude on the Intel forums who has been answering every single question for decades(?) at this point. Which means it also refused to do anything with features newer than 1992 and was worthless.

Tried again while chatting with an old work buddy a few months back and it looks like they updated to acknowledging f99 and f03 exist. So assume that was all stack overflow.

BlanketsWithSmallpox@lemmy.world · 2 months ago

This message brought to you by chatgpt bot.

Bell@lemmy.world · 2 months ago

Take all you want, it will only take a few hallucinations before no one trusts LLMs to write code or give advice

sramder@lemmy.world · 2 months ago

[…]will only take a few hallucinations before no one trusts LLMs to write code or give advice

Because none of us have ever blindly pasted some code we got off google and crossed our fingers ;-)

Avid Amoeba@lemmy.ca · edit-2 2 months ago

It’s way easier to figure that out than check ChatGPT hallucinations. There’s usually someone saying why a response in SO is wrong, either in another response or a comment. You can filter most of the garbage right at that point, without having to put it in your codebase and discover that the hard way. You get none of that information with ChatGPT. The data spat out is not equivalent.

deweydecibel@lemmy.world · 2 months ago

That’s an important point, and and it ties into the way ChatGPT and other LLMs take advantage of a flaw in the human brain:

Because it impersonates a human, people are more inherently willing to trust it. To think it’s “smart”. It’s dangerous how people who don’t know any better (and many people that do know better) will defer to it, consciously or unconsciously, as an authority and never second guess it.

And the fact it’s a one on one conversation, no comment sections, no one else looking at the responses to call them out as bullshit, the user just won’t second guess it.

KeenFlame@feddit.nu · 2 months ago

Your thinking is extremely black and white. Many many, probably most actually, second guess chat bot responses.

Seasm0ke@lemmy.world · 2 months ago

Split segment of data without pii to staging database, test pasted script, completely rewrite script over the next three hours.

Spedwell@lemmy.world · 2 months ago

We should already be at that point. We have already seen LLMs’ potential to inadvertently backdoor your code and to inadvertently help you violate copyright law (I guess we do need to wait to see what the courts rule, but I’ll be rooting for the open-source authors).

If you use LLMs in your professional work, you’re crazy. I would never be comfortably opening myself up to the legal and security liabilities of AI tools.

Cubes@lemm.ee · 2 months ago

If you use LLMs in your professional work, you’re crazy

Eh, we use copilot at work and it can be pretty helpful. You should always check and understand any code you commit to any project, so if you just blindly paste flawed code (like with stack overflow,) that’s kind of on you for not understanding what you’re doing.

Amanduh@lemm.ee · 2 months ago

Yeah but if you’re not feeding it protected code and just asking simple questions for libraries etc then it’s good

FaceDeer@fedia.io · 2 months ago

Maybe for people who have no clue how to work with an LLM. They don’t have to be perfect to still be incredibly valuable, I make use of them all the time and hallucinations aren’t a problem if you use the right tools for the job in the right way.

barsquid@lemmy.world · 2 months ago

The last time I saw someone talk about using the right LLM tool for the job, they were describing turning two minutes of writing a simple map/reduce into one minute of reading enough to confirm the generated one worked. I think I’ll pass on that.

linearchaos@lemmy.world · 2 months ago

confirm the generated one worked. I think I’ll pass on tha

LLM wasn’t the right tool for the job, so search engine companies made their search engines suck so bad that it was an acceptable replacement.

NuXCOM_90Percent@lemmy.zip · 2 months ago

Honestly? I think search engines are actually the best use for LLMs. We just need them to be “explainable” and actually cite things.

Even going back to the AOL days, Ask Jeeves was awesome and a lot of us STILL write our google queries in question form when we aren’t looking for a specific factoid. And LLMs are awesome for parsing those semi-rambling queries like “I am thinking of a book. It was maybe in the early 00s? It was about a former fighter pilot turned ship captain leading the first FTL expedition and he found aliens and it ended with him and humanity fighting off an alien invasion on Earth” and can build on queries to drill down until you have the answer (Evan Currie’s Odyssey One, by the way).

Combine that with citations of what page(s) the information was pulled from and you have a PERFECT search engine.

notabot@lemm.ee · 2 months ago

That may be your perfect search engine, I jyst want proper boolean operators on a sesrch engine that doesn’t think it knows what I want better than I do, and doesn’t pack the results out with pages that don’t match all the criteria just for the sake of it. The sort of thing you described would be anathema to me, as I suspect my preferred option may be to you.

FaceDeer@fedia.io · 2 months ago

You’re describing Bing Chat.

NuXCOM_90Percent@lemmy.zip · 2 months ago

And google gemini (?) and kagi’s LLM and all the other ones.

stonerboner@lemmynsfw.com · 2 months ago

This. I use LLM for work, primarily to help create extremely complex nested functions.

I don’t count on LLM’s to create anything new for me, or to provide any data points. I provide the logic, and explain exactly what I want in the end.

I take a process which normally takes 45 minutes daily, test it once, and now I have reclaimed 43 extra minutes of my time each day.

It’s easy and safe to test before I apply it to real data.

It’s missed the mark a few times as I learned how to properly work with it, but now I’m consistently getting good results.

Other use cases are up for debate, but I agree when used properly hallucinations are not much of a problem. When I see people complain about them, that tells me they’re using the tool to generate data, which of course is stupid.

capital@lemmy.world · edit-2 2 months ago

People keep saying this but it’s just wrong.

Maybe I haven’t tried the language you have but it’s pretty damn good at code.

Granted, whatever it puts out needs to be tested and possibly edited but that’s the same thing we had to do with Stack Overflow answers.

CeeBee@lemmy.world · 2 months ago

I’ve tried a lot of scenarios and languages with various LLMs. The biggest takeaway I have is that AI can get you started on something or help you solve some issues. I’ve generally found that anything beyond a block or two of code becomes useless. The more it generates the more weirdness starts popping up, or it outright hallucinates.

For example, today I used an LLM to help me tighten up an incredibly verbose bit of code. Today was just not my day and I knew there was a cleaner way of doing it, but it just wasn’t coming to me. A quick “make this cleaner: <code>” and I was back to the rest of the code.

This is what LLMs are currently good for. They are just another tool like tab completion or code linting

antihumanitarian@lemmy.world · 2 months ago

Have you tried recent models? They’re not perfect no, but they can usually get you most of the way there if not all the way. If you know how to structure the problem and prompt, granted.

NuXCOM_90Percent@lemmy.zip · 2 months ago

We already have those near constantly. And we still keep asking queries.

People assume that LLMs need to be ready to replace a principle engineer or a doctor or lawyer with decades of experience.

This is already at the point where we can replace an intern or one of the less good junior engineers. Because anyone who has done code review or has had to do rounds with medical interns know… they are idiots who need people to check their work constantly. An LLM making up some functions because they saw it in stack overflow but never tested is not at all different than a hotshot intern who copied some code from stack overflow and never tested it.

Except one costs a lot less…

NaibofTabr@infosec.pub · edit-2 2 months ago

This is already at the point where we can replace an intern or one of the less good junior engineers.

This is a bad thing.

Not just because it will put the people you’re talking about out of work in the short term, but because it will prevent the next generation of developers from getting that low-level experience. They’re not “idiots”, they’re inexperienced. They need to get experience. They won’t if they’re replaced by automation.

ipkpjersi@lemmy.ml · edit-2 2 months ago

First a nearly unprecedented world-wide pandemic followed almost immediately by record-breaking layoffs then AI taking over the world, man it is really not a good time to start out as a newer developer. I feel so fortunate that I started working full-time as a developer nearly a decade ago.

morrowind@lemmy.ml · 2 months ago

Dude the pandemic was amazing for devs, tech companies hiring like mad, really easy to get your foot in the door. Now, between all the layoffs and AI it is hellish

LucidNightmare@lemmy.world · 2 months ago

So, the whole point of learning is to ask questions from people who know more than you, so that you can gain the knowledge you need to succeed…

So… if you try to use these LLMs to replace parts of sectors, where there need to be people that can work their way to the next tier as they learn more and get better at their respective sectors, you do realize that eventually there will no longer be people that can move up their respective tier/position, because people like you said “Fuck ‘em, all in on this stupid LLM bullshit!” So now there are no more doctors, or real programmers, because people like you thought it would just be the GREATEST idea to replace humans with fucking LLMs.

You do see that, right?

Calling people fucking stupid, because they are learning, is actually pretty fucking stupid.

NuXCOM_90Percent@lemmy.zip · edit-2 2 months ago

Where did I say “Fuck 'em, all in on this stupid LLM bullshit!”?

But yes, there is a massive labor issue coming. That is why I am such a proponent of Universal Basic Income because there are not going to be enough jobs out there.

But as for training up the interns: Back in the day, do you know what “interns” did? And by “interns” I mean women because sexism but roll with me. Printing out and sorting punch cards. Compilers and general technical advances got rid of those jobs and pushed up where the “charlie work” goes.

These days? There are good internships/junior positions and bad ones. A good one actually teaches skills and encourages the worker to contribute. A bad one has them do the mindless grunt work that nobody else wants to. LLMs get rid of the latter.

And… I actually think that is good for the overall health of workers, if not the number (again, UBI). Because if someone can’t be trusted to write meaningful code without copying it off the internet and not even updating variable names? I don’t want to work with them. I spend too much of my workday babysitting those morons who are just here there to get some work experience so they can con their way into a different role and be someone else’s problem.

And experience will be gained the way it is increasingly being gained. Working on (generally open source) projects and interviewing for competitive internships where the idea is to take relatively low cost workers and have them work on a low ROI task that is actually interesting. It is better for the intern because they learn actual development and collaboration skills. And it is better for the staff because it is a way to let people work on the stuff they actually want to do without the massive investment of a few hundred hours of a Senior Engineer’s time.

And… there will be a lot fewer of those roles. Just like there were a lot fewer roles for artists as animation tools stopped requiring every single cell of animation to be hand drawn. And that is why we need to decouple life from work through UBI.

But also? If we have less internships that consist of “okay. good job. thanks for that. Next time can you at least try and compile your code? or pay attention to the squiggly red lines in your IDE? or listen to the person telling you that is wrong?”? Then we have better workers and better junior developers who can actually do more meaningful work. And we’ll actually need to update the interviewing system to not just be “did you memorize this book of questions from Amazon?” and we’ll have fewer “hot hires” who surprise everyone by being able to breath unassisted but have a very high salary because they worked for facebook.

Because, and here is the thing: LLMs are already as good, if not better than, an intern or junior engineer. And the companies that spend money on training up interns aren’t going to be rewarded. Under capitalism, there is no reason to “take one for the team” so that your competition can benefit.

assassin_aragorn@lemmy.world · 2 months ago

This is already at the point where we can replace an intern or one of the less good junior engineers. Because anyone who has done code review or has had to do rounds with medical interns know… they are idiots who need people to check their work constantly.

Do so at your own peril. Because the thing is, a person will learn from their mistakes and grow in knowledge and experience over time. An LLM is unlikely to do the same in a professional environment for two big reasons:

The company using the LLM would have to send data back to the creator of the LLM. This means their proprietary work could be at risk. The AI company could scoop them, or a data leak would be disastrous.
Alternatively, the LLM could self-learn and be solely in house without any external data connections. A company with an LLM will never go for this, because it would mean their model is improving and developing out of their control. Their customized version may end up being better than their the LLM company’s future releases. Or, something might go terribly wrong with the model while it learns and adapts. If the LLM company isn’t held legally liable, they’re still going to lose that business going forward.

On top of that, you need your inexperienced noobs to one day become the ones checking the output of an LLM. They can’t do that unless they get experience doing the work. Companies already have proprietary models that just require the right inputs and pressing a button. Engineers are still hired though to interpret the results, know what inputs are the right ones, and understand how the model works.

A company that tries replacing them with LLMs is going to lose in the long run to competitors.

ѕєχυαℓ ρσℓутσρє@lemmy.sdf.org · edit-2 2 months ago

I hope you’re replaced with an AI soon. LLMs are already capable of having these bullshit takes.

just_another_person@lemmy.world · 2 months ago

I got an email ban.

1609 hours logged 431 solved threads

tearsintherain@leminal.space · edit-2 2 months ago

Reddit/Stack/AI are the latest examples of an economic system where a few people monetize and get wealthy using the output of the very many.

TheLongPrice@lemmy.one · 2 months ago

Technofeudalism

Mango@lemmy.world · 2 months ago

It’s very precisely that.

Jimmyeatsausage@lemmy.world · 2 months ago

You really don’t need anything near as complex as AI…a simple script could be configured to automatically close the issue as solved with a link to a randomly-selected unrelated issue.

ChapulinColorado@lemmy.world · 2 months ago

So vanilla stack overflow?

Rai@lemmy.dbzer0.com · 2 months ago

That’s the joke

ChapulinColorado@lemmy.world · 2 months ago

I’m slow.

Rai@lemmy.dbzer0.com · 2 months ago

Based and same-here-often…pilled

Hypx@fedia.io · 2 months ago

Eventually, we will need a fediverse version of StackOverflow, Quora, etc.

Thomas@discuss.tchncs.de · 2 months ago

Those would be harvested to train LLMs even without asking first. 😐

sramder@lemmy.world · 2 months ago

At this point I’m assuming most if not all of these content deals are essentially retroactive. They already scrapped the content and found it useful enough to try and secure future use, or at least exclude competitors.

Ricky Rigatoni@lemm.ee · 2 months ago

They scraped the content, liked the results, and are only making these deals because it’s cheaper than getting sued.

linearchaos@lemmy.world · 2 months ago

Honestly? I’m down with that. And when the LLM’s end up pricing themselves out of usefulness, we’ll still have the fediverse version. Having free sites on the net with solid crowd-sourced information is never a bad thing even if other people pick up the data and use it.

It’s when private sites like Duolingo and Reddit crowd source the information and then slowly crank down the free aspect that we have the problems.

The Ad sponsored web model is not viable forever.

bort@sopuli.xyz · 2 months ago

The Ad sponsored web model is not viable forever.

a thousand times this

danc4498@lemmy.world · 2 months ago

I’d rather the harvesting be open to all than only the company hosting it.

mox@lemmy.sdf.org · 2 months ago

Assuming the federated version allowed contributor-chosen licenses (similar to GitHub), any harvesting in violation of the license would be subject to legal action.

Contrast that with Stack Exchange, where I assume the terms dictated by Stack Exchange deprive contributors of recourse.

Rolando@lemmy.world · 2 months ago

But users and instances would be able to state that they do not want their content commercialized. On StackOverflow you have no control over that.

ArbitraryValue@sh.itjust.works · 2 months ago

You can state what you don’t want, but no one will be paying attention. Except maybe the LLM reading your posts…

pivot_root@lemmy.world · 2 months ago

Yup. Laws are only suggestions until you get caught.

ArbitraryValue@sh.itjust.works · edit-2 2 months ago

I suspect it isn’t even illegal, but I’m not an expert.

thejml@lemm.ee · 2 months ago

Not fediverse, but open-source and community run: https://codidact.com

linearchaos@lemmy.world · 2 months ago

Smells too much like duo-lingo. Here, everyone jump in and answers all the questions. 5 years later, ohh look at this gold mine of community data we own…

residentmarchant@lemmy.world · 2 months ago

This was actually the whole original point of Duolingo. The founder previously created Recaptcha to crowd source machine vision of scanned books.

His whole thing is crowd sourcing difficult tasks that machines struggle with by providing some sort of reason to do it (prevent spam at first and learn a language now)

From what I understand Duolingo just got too popular and the subscription service they offer made them enough money to be happy with.

BraveLittleToaster@lemmy.world · 2 months ago

Everything you write on here is public. There’s nothing stopping anyone from using that data for training

linearchaos@lemmy.world · 2 months ago

We needed it a few years ago.

tabular@lemmy.world · 2 months ago

I despise this use of mod power in response to a protest. It’s our content to be sabotaged if we want - if Stack Overlords disagree then to hell with them.

I’ll add Stack Overflow to my personal ban list, just below Reddit.

partial_accumen@lemmy.world · 2 months ago

A malicious response by users would be to employ an LLM instructed to write plausibly sounding but very wrong answers to historical and current questions, then an army of users upvoting the known wrong answer while downvoting accurate ones. This would poison the data I would think.

Emotet@slrpnk.net · 2 months ago

All use of generative AI (e.g., ChatGPT1 and other LLMs) is banned when posting content on Stack Overflow. This includes “asking” the question to an AI generator then copy-pasting its output as well as using an AI generator to “reword” your answers.

Ironic, isn’t it?

partial_accumen@lemmy.world · 2 months ago

Interestingly I see nothing in that policy that would dis-allow machine generated downvotes on proper answers and machine generated upvotes on incorrect ones. So even if LLMs are banned from posting questions or comments, looks like Stackoverflow is perfectly fine with bots voting.

brbposting@sh.itjust.works · 2 months ago

Sounds like it would require some significant resources to combat.

That said, that plan comes at a cost to presumably innocent users who will bark up the wrong trees.

archomrade [he/him]@midwest.social · 2 months ago

Data should be socialized and machine learning algorithms should be nationalized for public use.

explodicle@sh.itjust.works · 2 months ago

Better yet, copyright should be abolished completely.

assassin_aragorn@lemmy.world · 2 months ago

It should stay for creative works but that’s it. It should protect people who actually write books, compose music, make art, and sing. It shouldn’t be held by corporations forever by leeching off their workers.

spez_@lemmy.world · 2 months ago

Public+ no copyright

Chemical Wonka@discuss.tchncs.de · 2 months ago

Reddit did almost the same and don’t forget guys to delete your Reddit account

frostysauce@lemmy.world · 2 months ago

It won’t matter, they would have all of your comments archived already. Even if you overwrite them AI will be scraping the copies they keep.

Fedizen@lemmy.world · 2 months ago

it creates a lot of poisoned data especially if you like edit half your posts with nonsense

Holzkohlen@feddit.de · 2 months ago

RIP in pieces Stack Overflow

T00l_shed@lemmy.world · edit-2 2 months ago

The enshitification will continue while moral tanks.

werefreeatlast@lemmy.world · 2 months ago

The enshitification will continue for a while, moral thanks!

floofloof@lemmy.ca · edit-2 2 months ago

If we can’t delete our questions and answers, can we poison the well by uploading masses of shitty questions and answers? If they like AI we could have it help us generate them.

pivot_root@lemmy.world · 2 months ago

Poison the well by using AI-generated comments and answers. There isn’t currently a way to reliably determine if content is human or AI-generated, and training AI on AI is the equivalent of inbreeding.

T00l_shed@lemmy.world · 2 months ago

Sounds good then.

trolololol@lemmy.world · 2 months ago

The poison was there all along the way. The poison is us

Inserts spider man meme

rimjob_rainer@discuss.tchncs.de · edit-2 2 months ago

While I think the reaction of StackOverflow is not good, I don’t understand the users either.

EDIT: seems like the language model won’t be free, I understand then.

assassin_aragorn@lemmy.world · 2 months ago

OpenAI is a terribly misleading name.

frostysauce@lemmy.world · 2 months ago

OpenUpYourWalletforAI

Fedizen@lemmy.world · 2 months ago

primary use for AI is self destructing your website.

CthulhuDreamer@lemmy.world · 2 months ago

I am not deleting anything. They can have all of my poorly written misleading answers.