• 4 Posts
  • 2 Comments
Joined 1 year ago
cake
Cake day: June 11th, 2023

help-circle
  • Can confirm. It seems counterintuitive, but more data needs more resources, more indexing, more room for errors.

    In my experimentation with RVC, I’ve experimented with all sorts of sizes, and I’ve found my 2 hour datasets take forever and produce subpar results. 5-15 minutes worth of speech data is the sweet spot. No amount of training seems to fix it, it’s counterproductive to overtrain it, but the model just can’t figure out what to do with all of that data it seems.

    Granted, different models can have different advantages and will certainly have different results, but how many times have you been researching something and found so many conflicting pieces of information? If it’s 1 out of 10 pieces of data, that’s easy enough, but now a larger dataset is 10 out of 100 pieces of conflicting information… It’s still 10%, but unfortunately, it’s now 10 pieces of data that it needs to figure out how to interpret, even if the other 90 pieces agree with each other. Just like us, it can get to a point where it’s just too much information to deal with.

    Definitely a point of diminishing returns.



  • Honestly, it’s why I’m trying to jumpstart more diverse but “safe” communities. If it’s nothing but lolis and cubs, the target is clear, but with decent discussion based communities, it becomes a bit more difficult and there’s an argument to be made that they’re not just blocking one type of content, they’re blocking entirely unrelated communities because they don’t like part of what they’re hosting. That bolsters the free speech argument even more when speech they like is inherently intertwined with speech they don’t like.

    I want this place to thrive. It doesn’t need to be the biggest, and inevitably zero tolerance types will block it regardless, but I want the decision to censor it to be painful for as many people as possible.