Proposal: Archiving Discord Discussions

Proposal: Archive Discord Discussions for Accessibility and Knowledge Sharing

Hello everyone!

I’m excited to bring forward a suggestion that I believe could significantly benefit the community. Many of us may or may not know about the Godot Engine Discord server and its forum - a bustling hub of discussions, tips, and insights. However, there’s a catch – these valuable conversations are not searchable via search engines since Discord forums aren’t indexed. This means we’re sitting on a goldmine of knowledge that’s not easily accessible to everyone. In particular, I’m referring to the gd3-help-forum and gd4-help-forum.

Here’s my proposal: Let’s periodically archive the forum posts from the Discord server. Think of it as a way to immortalize our discussions and make them easily searchable for anyone seeking knowledge or solutions. This approach draws inspiration from the successful ask.godotengine.org migration. We can credit each post to its original contributor by displaying their Discord username, similar to the example shown below.

To keep things organized, we can categorize these posts under ‘Archive’ or ‘Discord’. This dual categorization will help users navigate and understand the context of these discussions.

The key benefits of implementing this idea include:

  1. Enhanced Discoverability: By making these discussions searchable via search engines, we’re not only preserving knowledge but also making it more accessible. This is vital for troubleshooting and learning, as search engines are often the first port of call for finding solutions.
  2. Continued Benefits of Discord: We can keep enjoying the real-time interaction that Discord offers, while also leveraging the content for broader use on this site. It’s like getting the best of both worlds!
  3. Centralized Knowledge Base: This initiative will contribute to creating a more centralized and comprehensive knowledge repository for Godot, which can be invaluable for both new and seasoned users.

Some additional considerations:

  • Discord Integration on the Site: A potential idea is to integrate Discord with this forum. Users could link their Discord and forum accounts, allowing for a seamless association of content across both platforms.
  • Addressing Potential Concerns: We need to be mindful of technical challenges and privacy. For instance, ensuring that content scraping respects user permissions and privacy settings is crucial. We’ll need a transparent and respectful approach to content archiving.

I believe this proposal can open up new avenues for knowledge sharing and community engagement. Of course, this is just a starting point, and I’d love to hear your thoughts, suggestions, and any potential concerns.

2 Likes

A noble idea, but

Archiving all of that will increase forum server storage requirements several times over. Given that some (if not most) discussions there are very specific/personal and aren’t really useful to other people, how do we determine what is worth archiving and what is just garbage?

That is a good point, though I feel that any type of forum will face issues with content moderation, including this one. How does one even assess the ‘worthiness’ of a post? Are there any guidelines that this forum follows?

Perhaps I am wrong, but I feel that there are enough valuable discussions held there that it is worth the effort to archive them. Although it is certainly a valid concern that this kind of endeavor would increase the workload on those who moderate the content as well as the server’s resource requirements. Discord API rate limits may be a barrier as well.

I unfortunately don’t have a clear answer to your question at the moment, but I’ll at least summarize the additional concerns that I hadn’t fully considered in my original post.

  • Impact on server costs. My counter to this is to ask what is the purpose of this forum? Isn’t it to consolidate Godot knowledge from its user-base and provide a platform for discussion? I feel that this proposal would align enough with this forum’s purpose to justify the increased storage costs. If the cost increase is substantial, then we can consider stricter criteria for archiving.
  • Discord API rate limiting for scraping the server forum. It appears that Discord rate limits are variable/dynamic. Catching up with the backlog of posts may take some time, but once we catch up, I don’t think this will be an issue for the foreseeable future.
  • Impact on content moderation. I’m honestly not experienced in regards to content moderation so I’m not clear on what the challenges around this would be other than the increased workload. There are some simple metrics that we could try using like number of reactions or number of responses to filter posts, but I don’t think that is a reliable method of filtering what to archive.

I’d love to hear more thoughts on the matter. Thank you for sharing your concerns.

My small proposal to counter these issues:
Add a “personal” tag only visible to the server, so it won’t index those.
Count the number of likes and comments on a post.
We place only the title and tags in the indexable database and when someone visits it they are redericted to the appropiate post.

While I’m not a fan of content being hostage on discord, there’s two additional concerns that we would need to discuss first:

  • Discord TOS. Is scraping and republishing allowed by their terms of service? We don’t want to risk the server being closed by discord.
  • GDPR/Privacy. People post on discord and expect the things to be short lived. Especially in the forum channels where posts are also closed after a while. People may not appreciate their messages being persisted on the web, where it’s indexed by Google and picked up by sites like the web archive. Think for example about someone making a comment in chat that wasn’t really thought through. In real-time conversations that’s ok and expected. But now it shows up on a forum where it ends up on Google. Not so cool. So if anything, I don’t think we can import any old posts from discord. For the future there would need to be clear communication like “If you post here, it will be uploaded to the forum” so that people know this before posting there. Also for GDPR compliance we would need a way for users to request all data that is stored about them here, plus a way to request the removal of all data about them. This is a serious overhead implementation wise as it means keeping track of user IDs to know who posted what originally, even if usernames change.
4 Likes

It isn’t allowed as per Discord’s ToS, and that’s the largest problem by far.

1 Like

Well, that concludes this sadly. We will remind people on discord from time to time to post here though! There has been some talk in our discord mods channel and we really want to push the forum a bit. Discord is nice for realtime communication, the forum is nice for anything that should be persistent and indexed.

@Calinou, could you specify which part of the ToS would disallow this proposal? I initially used the term ‘scraping’, which is indeed prohibited by Discord, but using their API should be permissible under certain conditions.

After thoroughly reviewing Discord’s Developer Policy and Developer Terms of Service, I’d like to share some insights and considerations regarding the goal of archiving Discord discussions for greater accessibility and knowledge sharing.

1. API Use vs. Scraping: Firstly, it’s crucial to distinguish between data scraping (which is prohibited by Discord) and the use of Discord’s API. The good news is that Discord allows the use of its API within specified guidelines. This means we can technically archive discussions using the API, as long as we adhere to Discord’s rules. It was my mistake to use the term ‘scraping’ in my original post.

2. User Privacy and Consent: A significant factor is respecting user privacy. Discord’s terms emphasize user consent and privacy, aligning with laws like GDPR. For this proposal, this implies obtaining explicit consent from users for their discussions to be archived and made publicly accessible. It’s a crucial step not only for legal compliance but also for maintaining trust within the community.

3. Technical and Rate Limit Constraints: Discord imposes limits on API usage. This might affect the volume and frequency of data we can archive. We’ll need to consider these limits to ensure uninterrupted and compliant operation.

4. Compliance with Discord’s Policies: This forum would need to comply with Discord’s guidelines, including how to handle, store, and use the API data. We cannot use the data for profiling, selling, or in ways that contradict user expectations. To reiterate, proper measures must be taken to securely handle and store the data retrieved from Discord, ensuring it’s protected and managed according to both Discord’s guidelines and applicable data protection laws.

5. Ongoing Policy Updates: Discord’s policies can change, and we must stay vigilant to adapt our archival process to any new updates. This requires a commitment to ongoing maintenance and review.

6. Ethical Considerations and Community Perspective: Beyond legalities, as @winston-yallow mentioned, we should consider the community’s stance on archiving discussions. How do users feel about their conversations being permanently stored and searchable? We need to weigh the benefits of knowledge sharing against potential concerns about the permanence of discussions that may not be intended for that purpose.

7. Feasibility and Worthiness of Pursuit: Given these considerations, is this idea still worth pursuing? In my opinion, yes, but only under certain conditions. The potential for creating a valuable, searchable knowledge base from Discord discussions is immense. However, this must be balanced with a respectful and compliant approach to user data and privacy. I think the primary question that we must ask before anything else is: Are the discussions that are held in Discord valuable enough to warrant all of this effort? Perhaps, although the opinion of the site’s moderators takes precedence, we could measure the community’s sentiment with a survey or poll?

Proposed Approach: A possible way forward could be implementing an opt-in system for users who wish to have their discussions archived. This respects user choice and privacy while still allowing us to build a repository of knowledge. Also, clear communication and transparency about how data is used and stored will be essential.

Determining an effective method for user consent is crucial for this archival proposal. Here are some potential approaches, along with their benefits and concerns:

  1. Server Rule for Automatic Consent:
    • Approach: Adding a rule to the Discord server that states posting in forum channels implies consent to archiving.
    • Benefit: Ensures all posts in specific channels are covered without requiring additional action from users.
    • Concern: Lacks flexibility and might force users to choose between participating under these conditions or not participating at all. It could also lead to some users being unaware or uncomfortable with this blanket consent.
  2. Forum Channel Guideline:
    • Approach: Implementing a guideline within the forum channel indicating that posts may be archived.
    • Benefit: Provides a general heads-up to users about the potential for their posts to be archived.
    • Concern: This method doesn’t include an active acknowledgment mechanism, so some users might miss or overlook this guideline.
  3. :star: ‘Archiver’ Role for Opt-in Consent (Recommended Approach):
    • Approach: Allowing users to opt-in for a special role, like 📝Archiver, that signals consent for their posts to be archived. The specifics of the role, including what it entails and how it affects user posts, can be thoroughly outlined in the #roles channel of the Discord server - serving as a clear, voluntary opt-in mechanism.
    • Benefit: Respects individual choice, allowing users to control their participation. It’s transparent, user-driven, and ensures users are well-informed about what consenting to archive entails.
    • Concern: Might result in fewer posts being archived, as it depends on users actively opting in. However, this is balanced by the respect for user autonomy and informed consent.
    • Note: Given its balance of user autonomy, informed consent, and practical application, I consider this the best option.
  4. ‘Archive’ Tag for Individual Posts:
    • Approach: Users can tag their individual posts with an ‘Archive’ tag to signify consent.
    • Benefit: Offers post-by-post consent, giving users maximum control over what gets archived.
    • Concern: Could lead to fewer posts being archived due to the additional effort required from users.
  5. Explicit Statement in Each Post:
    • Approach: Users explicitly state in their post that they permit archiving, e.g., “The Godot Engine Forum has my permission to archive this post.”
    • Benefit: Provides very clear and specific consent for each post.
    • Concern: Impractical for large-scale archiving and likely to result in the least amount of content being archived due to the extra effort required.

In conclusion, I believe the content on Discord is valuable enough to warrant the effort of archiving (which prompted my initial proposal). The potential benefits of making these discussions accessible and searchable are substantial, and it should be technically feasible assuming I correctly interpreted the Discord’s legalese. However, I fully recognize the complexities and potential difficulties in adhering to all the outlined conditions. It’s a significant undertaking, and I understand if the effort cannot be justified given the challenges.

2 Likes

The problem is this section of the Discord Terms of Service:

Other people’s content. Our services might also provide you with access to other people’s content. You may not use this content without that person’s consent, or as allowed by law.

Getting that consent in a reliable way that would be admissible in court is almost impossible for us. Especially considering that under GDPR users have a right to retract that consent at any given time or requesting a delete of all their associated personal data.

1 Like

Furthermore these sections also outline API access and complying with these seems out of scope for us (sure, not impossible, but the amount of work needed to build that infrastructure is quite high).

1 Like

The Archiver approach could be combined with opt-in/opt-out emojis on a per-post basis, whereas the archiving process would check whether such an emoji was set by the original author. If existing software is used for archiving and adding the logic there isn’t reasonable, it could always be the responsibility of a script that scours the data dump for that information, and yanks posts which don’t have the correct role/emoji combination to be archived.

I’d recommend creating two server-specific emojis for that purpose, one for opt-in, and one for opt-out. These emojis would always apply, regardless of whether a person has the Archiver role.

Also, in order for people to have time to apply the appropriate emojis to their posts, posts newer than, say, a week, or maybe even a month, shouldn’t be archived yet. This prevents the scenario where people are lollygagging around right at the time when the archiver is running, and the post is archived 0.8 seconds after hitting enter and the emoji comes in too late, and people with the Archiver role who might have cooked a post a little hotter that day can opt it out a few days later.

About deleting posts, GDPR and all that:

Why not start a free software repo for the whole archival process on GitHub? The matter of archiving is important, yes, but it’s not urgent in the way that it’d be a problem if it only ended up being ready for testing in a year or two. Once the role/emoji/whatever method is at least tentatively decided on, that could already be rolled out on the Discord server, even if nothing is getting archived yet, in order not to lose out on opt-ins in the meantime.

Also, regarding changes to a person’s posts (those older than the “archival grace time”), such as edits, deletions or opt-ins/outs, a discord bot command which allows a user to trigger the re-indexing of their posts could be a solution, which would kick-off a process that would update or delete posts from the archive if they’ve been deleted on discord or are found to not be eligible for archiving anymore.

Of course, if somebody doesn’t have access to Discord anymore, they’d still need access to their posts in the archive. Fulfilling this might be possible by only archiving the opted-in posts of Discord accounts that have gotten connected to a forum account, which could then be associated with the appropriate archived posts, even if access to the Discord account itself was lost. That would probably reduce the amount of archived posts significantly, though.

A poll might give some insight into the number of people who’d be at least willing to go through these steps, in order to assess whether all that infrastructure’s really worth it. By looking at the number of posts of those signaling so, we’d have at least a somewhat quantitative idea of how big a fraction of the discord content that might be.

About keeping track of Discord names: Discord accounts have a Snowflake ID, so keeping track of name changes shouldn’t be necessary. Updating the archived display name could be offered via discord commands: A command to trigger such an update, and perhaps one to opt-in (and opt-out again) to such updates being carried out automatically.

While I appreciate the many ideas for an implementation, this is not really helpful at the current state. A lot of the proposed solutions aren’t possibly that easily for technical reasons. That’s not to say it’s impossible, but it would need to be implemented a bit differently.

To be clear:

The technical part is not the issue.

We need to first make sure we can offer this legally. There is no point in further discussing ways to implement this.

Currently this is too much overhead to make sure we aren’t liable, so this won’t be done for now. We’ve had the same discussion (with other target platforms) on discord before, and so far the consensus is that it’s not going to happen out of privacy reasons.

1 Like

Thank you for the engaging and informative discussion surrounding the proposal.

This conversation has illuminated the complexities involved, particularly the legal challenges highlighted by the Discord Terms of Service and GDPR compliance. The technical solutions like the 📝Archiver role and opt-in/opt-out emojis are creative, but the legal barriers present significant hurdles.

However, it’s important to emphasize that the crux of this discussion isn’t just about the specific method of archiving Discord conversations. The core issue at hand is the increasing repository of valuable Godot knowledge that is, in effect, being held ‘hostage’ within the confines of Discord. This situation poses a significant challenge in terms of accessibility and long-term availability of this knowledge.

The proposal to archive Discord discussions was one potential solution to this broader problem. But as this discussion indicates, while the technical aspects may be feasible, ensuring legal compliance is a major obstacle. It’s evident that our current approach might not be the optimal solution, but this doesn’t diminish the importance of the underlying issue.

Therefore, I propose that we shift our focus to the bigger picture: How can we ensure that the wealth of Godot knowledge being generated in the Discord channels remains accessible and preserved for the community at large? This is not just about finding a workaround for Discord’s limitations but about future-proofing our collective knowledge.

One potential, albeit drastic, solution could be to restrict posting in the Discord forum channels, encouraging users to engage more actively on this forum. This approach primarily aims to address the critical issue of long-term viability and accessibility of our discussions. By shifting more conversation to this forum, we can ensure that the valuable knowledge and insights shared by our community are preserved in a more open, easily accessible format, not just for our current members but also for future generations of Godot enthusiasts. While this is a significant change, it’s important that we weigh its potential benefits against the impact on community dynamics, considering the convenience and immediacy that Discord offers.

In conclusion, while we may not have a clear path forward for the archival project right now, it’s crucial that we continue to explore and discuss ways to manage and preserve the wealth of knowledge within our community. This issue is larger than any single platform or tool – it’s about ensuring the longevity and accessibility of the knowledge we collectively create.

Yes, 100% this. The discord mods (me included) are aware of this and are also interested in finding a solution for the future. People will continue to use discord, but we can try to incentivize users to share knowledge outside of discord (for example here).

I think restricting posting will just lead to someone else creating an unrestricted discord server. There already are a few unofficial ones, and I’d like to prevent further fragmentation.

Instead I’d like to try and remind people of the forum regularly. We could for example:

  • Have a bot command !forum that says something like “This is very useful! Could you post it in the “Tips & Tricks” category on the forum?” with a link to it. This basically let’s humans do the archiving manually, so we don’t run into any problems legally as it’s original and reformulated content on the forum.
  • Have the bot reply to very long questions in help channels with something like “This topic seems unsuited for realtime communication, you might get better help on the forum.” with a link too.
  • Have a discord role that you can get if you post on the forum too. People love having roles, so this can further incentivize users to use the forum.
2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.