whoareu@lemmy.ca to

Python@programming.devEnglish · 9 months ago

I made a python script to scrape threads.net :)

3

8

I made a python script to scrape threads.net :)

whoareu@lemmy.ca to

Python@programming.devEnglish · 9 months ago

3

Hello,

I made a simple script to scraper threads.net using python and selenium. the script is just few lines long and it’s easy to understand.

So what this script does?

first it will open edge browser(which you can change it to firefox or chrome). now you have to enter credentials to log into it. your browsing data and credentials will be stored in user_data which you can move around.

It scroll through threads’s feed/hashtag/explore and It will store the src of every image it encounters so at the end we will have a links.txt file containing all the links to the images we have encountered.

now we have links.txt and we can use the following command to download all the images from the links.txt

wget -i links.txt

the script:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.edge.options import Options
import time

options = Options()
options.add_argument("--user-data-dir=user_data")

driver = webdriver.Edge(options=options)

driver.get('https://threads.net')

s = set()

input("Press any key to continue...")
for i in range(30):
    try:
        elements = driver.find_elements(By.XPATH, "//img")
        for e in elements:
            s.add(e.get_attribute("src"))
        driver.execute_script("window.scrollBy(0, 1000);")
        time.sleep(0.2)
    except:
        print("oopsie")

with open("links.txt", 'w') as f:
    links = list(s)
    for l in links:
        f.write(l+"\n")

driver.quit()

I hope it was usefull :D

Edit: here is a link to links.txt https://0x0.st/HGjx.txt

Chat

conorm@feddit.uk
link
fedilink
arrow-up
1·
9 months ago
nice code, could you share the asm for all that?

Python@programming.dev

python@programming.dev

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !python@programming.dev

Welcome to the Python community on the programming.dev Lemmy instance!

📅 Events

Past

November 2023

PyCon Ireland 2023, 11-12th
PyData Tel Aviv 2023 14th

October 2023

PyConES Canarias 2023, 6-8th
DjangoCon US 2023, 16-20th (!django 💬)

July 2023

PyDelhi Meetup, 2nd
PyCon Israel, 4-5th
DFW Pythoneers, 6th
Django Girls Abraka, 6-7th
SciPy 2023 10-16th, Austin
IndyPy, 11th
Leipzig Python User Group, 11th
Austin Python, 12th
EuroPython 2023, 17-23rd
Austin Python: Evening of Coding, 18th
PyHEP.dev 2023 - “Python in HEP” Developer’s Workshop, 25th

August 2023

PyLadies Dublin, 15th
EuroSciPy 2023, 14-18th

September 2023

PyData Amsterdam, 14-16th
PyCon UK, 22nd - 25th

🐍 Python project:

💓 Python Community:

#python IRC for general questions
#python-dev IRC for CPython developers
PySlackers Slack channel
Python Discord server
Python Weekly newsletters
Mailing lists
Forum

✨ Python Ecosystem:

🌌 Fediverse

Communities

#python on Mastodon
c/django on programming.dev
c/pythorhead on lemmy.dbzer0.com

Projects

Pythörhead: a Python library for interacting with Lemmy
Plemmy: a Python package for accessing the Lemmy API
pylemmy pylemmy enables simple access to Lemmy’s API with Python
mastodon.py, a Python wrapper for the Mastodon API

Feeds

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

31 users / day
140 users / week
463 users / month
1.51K users / 6 months
1 local subscriber
6.33K subscribers
469 Posts
2.27K Comments
Modlog