• redcalcium@lemmy.institute
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    1 year ago

    Why not talk to the instance admins directly and ask for their database dumps (minus the user accounts table and DMs) so you can ingest it into your search index? You’re doing this for the benefit of the fediverse, right? I’m sure most instance admins would help you if you ask. This should take care the historical data problem, and you can use activitypub for obtaining new data going forward without scraping the instances.