Compare commits

...

24 commits
v7.0.0 ... main

Author SHA1 Message Date
Michael
205b0731db
Merge pull request #103 from likeazir/patch-1
Add sharkey as misskey-api-capable
2024-03-04 07:14:45 +00:00
Jonas
bf0ed943ec
add sharkey as misskey-api-capable 2024-03-02 09:56:41 +00:00
Michael
f5c1033fc9
Merge pull request #102 from benyafai/main
🐳 docker-compose
2024-02-28 17:07:35 +00:00
Ben Yafai
ca302bb8db 🐳 docker-compose 2024-02-28 17:05:09 +00:00
Michael
34d07a4fa1
Merge pull request #95 from nanos/update-for-node-20
Update build-container.yaml
2024-02-02 07:25:44 +00:00
Michael
e86863a8ae
Update build-container.yaml 2024-02-02 07:25:08 +00:00
Michael
e4fca0d67e
Merge pull request #94 from nanos/node-16-depracation
Update some actions to use Node 20, now that Node 16 is deprecated
2024-02-02 07:05:45 +00:00
Michael
fe1c69f3ba
Update get_context.yml
Update upload-artifact
2024-01-30 21:10:29 +00:00
Michael
0416cc159a
Update get_context.yml
update correct line
2024-01-26 16:32:54 +00:00
Michael
52d3b8d9e9
Update get_context.yml
Update 2nd ceckout too
2024-01-26 16:31:22 +00:00
Michael
3d8ab95f11
Update get_context.yml
Update action for Node 16 deprecation (#92)
2024-01-26 15:52:45 +00:00
Michael
a8dc809787
Merge pull request #90 from himynameisjonas/patch-1
Build docker image for arm64 as well
2023-12-18 10:19:11 +00:00
Jonas Brusman
099ef7d37a
Build docker image for arm64 as well
Makes it possible to run it on a RaspberryPi
2023-12-16 16:07:36 +01:00
Michael
f69eaed5a6
Merge pull request #88 from zotanmew/main
Add support for Iceshrimp
2023-10-24 13:23:15 +01:00
Laura Hausmann
7be5dfb9b1
Add support for Iceshrimp 2023-10-21 23:41:05 +02:00
nanos
95b644d431 Define nodeLoc (fixes #82) 2023-09-07 08:43:10 +01:00
Michael
bed11e83f1
Merge pull request #80 from lhoBas/fix/k8s-cronjob
examples/k8s-cronjob.yaml: fix job naming
2023-08-31 08:33:12 +01:00
Bas
dafaf93d50
examples/k8s-cronjob.yaml: fix job naming
Fixes validation errors upon applying the k8s manifest:

```
The CronJob "FediFetcher" is invalid:
* metadata.name: Invalid value: "FediFetcher": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
* spec.jobTemplate.spec.template.spec.containers[0].name: Invalid value: "FediFetcher": a lowercase RFC 1123 label must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name',  or '123-abc', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')
```
2023-08-18 10:49:22 +02:00
nanos
31f475dcdd fixes #79 2023-08-18 08:00:09 +01:00
Michael
a76b52642d
Merge pull request #71 from ToadKing/retry-cleanup
remove redundant code for retrying on HTTP 429
2023-08-14 15:49:14 +01:00
Michael
0744caad6f
Merge pull request #75 from YoannMa/fixLog
fix bug when failing to get user's posts
2023-08-06 22:42:45 +01:00
Yoann MALLEMANCHE
adc0d4ec4e
fix bug when failing to get user's posts 2023-08-06 17:54:34 +02:00
nanos
253c7c4f2b Revert "print current version on startup" (#70)
This reverts commit 213ef57abe.
2023-08-06 09:45:33 +01:00
Toad King
db2dcce2ff remove redundant code for retrying on HTTP 429 2023-08-05 11:44:48 -05:00
7 changed files with 56 additions and 51 deletions

View file

@ -9,11 +9,12 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Set up QEMU
uses: docker/setup-qemu-action@v2
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
uses: docker/setup-buildx-action@v3
- name: Login to GHCR
uses: docker/login-action@v2
uses: docker/login-action@v3
if: github.event_name != 'pull_request'
with:
registry: ghcr.io
@ -21,9 +22,10 @@ jobs:
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push
id: docker_build
uses: docker/build-push-action@v4
uses: docker/build-push-action@v5
with:
push: true
platforms: linux/amd64,linux/arm64
tags: |
ghcr.io/${{ github.repository_owner }}/fedifetcher:${{ github.ref_name }}
ghcr.io/${{ github.repository_owner }}/fedifetcher:latest

View file

@ -12,17 +12,17 @@ jobs:
environment: mastodon
steps:
- name: Checkout original repository
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: '3.10'
cache: 'pip' # caching pip dependencies
- run: pip install -r requirements.txt
- name: Download all workflow run artifacts
uses: dawidd6/action-download-artifact@v2
uses: dawidd6/action-download-artifact@v3
with:
name: artifacts
workflow: get_context.yml
@ -32,12 +32,12 @@ jobs:
run: ls -lR
- run: python find_posts.py --lock-hours=0 --access-token=${{ secrets.ACCESS_TOKEN }} -c="./config.json"
- name: Upload artifacts
uses: actions/upload-artifact@v3
uses: actions/upload-artifact@v4
with:
name: artifacts
path: |
artifacts
- name: Checkout user's forked repository for keeping workflow alive
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Keep workflow alive
uses: gautamkrishnar/keepalive-workflow@v1
uses: gautamkrishnar/keepalive-workflow@v1

View file

@ -97,6 +97,8 @@ Persistent files are stored in `/app/artifacts` within the container, so you may
An [example Kubernetes CronJob](./examples/k8s-cronjob.yaml) for running the container is included in the `examples` folder.
An [example Docker Compose Script](./examples/docker-compose.yaml) for running the container periodically is included in the `examples` folder.
### Configuration options
FediFetcher has quite a few configuration options, so here is my quick configuration advice, that should probably work for most people:

View file

@ -0,0 +1,19 @@
name: fedifetcher
services:
fedifetcher:
stdin_open: true
tty: true
image: ghcr.io/nanos/fedifetcher:latest
command: "--access-token=<TOKEN> --server=<SERVER>"
# Persist our data
volumes:
- ./data:/app/artifacts
# Use the `deploy` option to enable `restart_policy`
deploy:
# Don't go above 1 replica to avoid multiple overlapping executions of the script
replicas: 1
restart_policy:
# The `any` condition means even after successful runs, we'll restart the script
condition: any
# Specify how often the script should run - for example; after 1 hour.
delay: 1h

View file

@ -14,7 +14,7 @@ spec:
apiVersion: batch/v1
kind: CronJob
metadata:
name: FediFetcher
name: fedifetcher
spec:
# Run every 2 hours
schedule: "0 */2 * * *"
@ -30,7 +30,7 @@ spec:
persistentVolumeClaim:
claimName: fedifetcher-pvc
containers:
- name: FediFetcher
- name: fedifetcher
image: ghcr.io/nanos/fedifetcher:latest
args:
- --server=your.server.social

View file

@ -12,7 +12,6 @@ import requests
import time
import argparse
import uuid
import git
import defusedxml.ElementTree as ET
argparser=argparse.ArgumentParser()
@ -138,7 +137,7 @@ def get_user_posts_mastodon(userName, webserver):
try:
user_id = get_user_id(webserver, userName)
except Exception as ex:
log(f"Error getting user ID for user {user['acct']}: {ex}")
log(f"Error getting user ID for user {userName}: {ex}")
return None
try:
@ -149,14 +148,14 @@ def get_user_posts_mastodon(userName, webserver):
return response.json()
elif response.status_code == 404:
raise Exception(
f"User {user['acct']} was not found on server {webserver}"
f"User {userName} was not found on server {webserver}"
)
else:
raise Exception(
f"Error getting URL {url}. Status code: {response.status_code}"
)
except Exception as ex:
log(f"Error getting posts for user {user['acct']}: {ex}")
log(f"Error getting posts for user {userName}: {ex}")
return None
def get_user_posts_lemmy(userName, userUrl, webserver):
@ -557,6 +556,11 @@ def parse_url(url, parsed_urls):
match = parse_mastodon_url(url)
if match is not None:
parsed_urls[url] = match
if url not in parsed_urls:
match = parse_mastodon_uri(url)
if match is not None:
parsed_urls[url] = match
if url not in parsed_urls:
match = parse_pleroma_url(url)
@ -602,6 +606,14 @@ def parse_mastodon_url(url):
return (match.group("server"), match.group("toot_id"))
return None
def parse_mastodon_uri(uri):
"""parse a Mastodon URI and return the server and ID"""
match = re.match(
r"https://(?P<server>[^/]+)/users/(?P<username>[^/]+)/statuses/(?P<toot_id>[^/]+)", uri
)
if match is not None:
return (match.group("server"), match.group("toot_id"))
return None
def parse_pleroma_url(url):
"""parse a Pleroma URL and return the server and ID"""
@ -734,11 +746,6 @@ def get_mastodon_urls(webserver, toot_id, toot_url):
except Exception as ex:
log(f"Error parsing context for toot {toot_url}. Exception: {ex}")
return []
elif resp.status_code == 429:
reset = datetime.strptime(resp.headers['x-ratelimit-reset'], '%Y-%m-%dT%H:%M:%S.%fZ')
log(f"Rate Limit hit when getting context for {toot_url}. Waiting to retry at {resp.headers['x-ratelimit-reset']}")
time.sleep((reset - datetime.now()).total_seconds() + 1)
return get_mastodon_urls(webserver, toot_id, toot_url)
log(
f"Error getting context for toot {toot_url}. Status code: {resp.status_code}"
@ -771,11 +778,6 @@ def get_lemmy_comment_context(webserver, toot_id, toot_url):
except Exception as ex:
log(f"Error parsing context for comment {toot_url}. Exception: {ex}")
return []
elif resp.status_code == 429:
reset = datetime.strptime(resp.headers['x-ratelimit-reset'], '%Y-%m-%dT%H:%M:%S.%fZ')
log(f"Rate Limit hit when getting context for {toot_url}. Waiting to retry at {resp.headers['x-ratelimit-reset']}")
time.sleep((reset - datetime.now()).total_seconds() + 1)
return get_lemmy_comment_context(webserver, toot_id, toot_url)
def get_lemmy_comments_urls(webserver, post_id, toot_url):
"""get the URLs of the comments of the given post"""
@ -812,11 +814,6 @@ def get_lemmy_comments_urls(webserver, post_id, toot_url):
return urls
except Exception as ex:
log(f"Error parsing comments for post {toot_url}. Exception: {ex}")
elif resp.status_code == 429:
reset = datetime.strptime(resp.headers['x-ratelimit-reset'], '%Y-%m-%dT%H:%M:%S.%fZ')
log(f"Rate Limit hit when getting comments for {toot_url}. Waiting to retry at {resp.headers['x-ratelimit-reset']}")
time.sleep((reset - datetime.now()).total_seconds() + 1)
return get_lemmy_comments_urls(webserver, post_id, toot_url)
log(f"Error getting comments for post {toot_url}. Status code: {resp.status_code}")
return []
@ -902,11 +899,6 @@ def add_context_url(url, server, access_token):
"Make sure you have the read:search scope enabled for your access token."
)
return False
elif resp.status_code == 429:
reset = datetime.strptime(resp.headers['x-ratelimit-reset'], '%Y-%m-%dT%H:%M:%S.%fZ')
log(f"Rate Limit hit when adding url {search_url}. Waiting to retry at {resp.headers['x-ratelimit-reset']}")
time.sleep((reset - datetime.now()).total_seconds() + 1)
return add_context_url(url, server, access_token)
else:
log(
f"Error adding url {search_url} to server {server}. Status code: {resp.status_code}"
@ -1143,6 +1135,7 @@ def get_nodeinfo(server, seen_hosts, host_meta_fallback = False):
return None
if resp.status_code == 200:
nodeLoc = None
try:
nodeInfo = resp.json()
for link in nodeInfo['links']:
@ -1175,7 +1168,7 @@ def get_nodeinfo(server, seen_hosts, host_meta_fallback = False):
# return early if the web domain has been seen previously (in cases with host-meta lookups)
if server in seen_hosts:
return seen_hosts[server]
return seen_hosts.get(server)
try:
resp = get(nodeLoc, timeout = 30)
@ -1225,8 +1218,8 @@ def get_server_info(server, seen_hosts):
def set_server_apis(server):
# support for new server software should be added here
software_apis = {
'mastodonApiSupport': ['mastodon', 'pleroma', 'akkoma', 'pixelfed', 'hometown'],
'misskeyApiSupport': ['misskey', 'calckey', 'firefish', 'foundkey'],
'mastodonApiSupport': ['mastodon', 'pleroma', 'akkoma', 'pixelfed', 'hometown', 'iceshrimp'],
'misskeyApiSupport': ['misskey', 'calckey', 'firefish', 'foundkey', 'sharkey'],
'lemmyApiSupport': ['lemmy']
}
@ -1247,16 +1240,7 @@ def set_server_apis(server):
if __name__ == "__main__":
start = datetime.now()
repo = git.Repo(os.getcwd())
tag = next((tag for tag in repo.tags if tag.commit == repo.head.commit), None)
if(isinstance(tag, git.TagReference)) :
version = tag.name
else:
version = f"on commit {repo.head.commit.name_rev}"
log(f"Starting FediFetcher {version}")
log(f"Starting FediFetcher")
arguments = argparser.parse_args()

View file

@ -1,8 +1,6 @@
certifi==2022.12.7
charset-normalizer==3.0.1
docutils==0.19
gitdb==4.0.10
GitPython==3.1.31
idna==3.4
python-dateutil==2.8.2
requests==2.28.2