Scraping a widget

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Scraping a widget

jk000ru
Hello,

I'm trying to scrape data from a widget. It was working for the first page but there was tons more data below that it wasn't scraping. So, next I added code to scroll to the end of the page so all the data could be scraped. Now, however
when it's finished scrolling to the end of the page, it just waits and never prints. Any idea how to get it to stop waiting and print? Eventually, I'd like to try to bring the data into excel if anyone knows how to do that too. Thanks


from
selenium import webdriver url = 'http://www.tradingview.com/screener' driver = webdriver.Firefox() driver.get(url) SCROLL_PAUSE_TIME = 0.5 # Get scroll height last_height = driver.execute_script("return document.body.scrollHeight") while True: # Scroll down to bottom driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") # Wait to load page time.sleep(SCROLL_PAUSE_TIME) # Calculate new scroll height and compare with last scroll height new_height = driver.execute_script("return document.body.scrollHeight") if new_height == last_height: break last_height = new_height element = driver.find_element_by_id('js-screener-container') print (element.text)

--
You received this message because you are subscribed to the Google Groups "Selenium Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/selenium-users/dc900394-1ddf-4212-8229-693ffb076b0b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Scraping a widget

Munagala Srikanth
May be that element is not holding the data, find where the data is stored, use class instead of container

On Mon, Jun 18, 2018 at 4:16 PM, <[hidden email]> wrote:
Hello,

I'm trying to scrape data from a widget. It was working for the first page but there was tons more data below that it wasn't scraping. So, next I added code to scroll to the end of the page so all the data could be scraped. Now, however
when it's finished scrolling to the end of the page, it just waits and never prints. Any idea how to get it to stop waiting and print? Eventually, I'd like to try to bring the data into excel if anyone knows how to do that too. Thanks


from
selenium import webdriver url = 'http://www.tradingview.com/screener' driver = webdriver.Firefox() driver.get(url) SCROLL_PAUSE_TIME = 0.5 # Get scroll height last_height = driver.execute_script("return document.body.scrollHeight") while True: # Scroll down to bottom driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") # Wait to load page time.sleep(SCROLL_PAUSE_TIME) # Calculate new scroll height and compare with last scroll height new_height = driver.execute_script("return document.body.scrollHeight") if new_height == last_height: break last_height = new_height element = driver.find_element_by_id('js-screener-container') print (element.text)

--
You received this message because you are subscribed to the Google Groups "Selenium Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/selenium-users/dc900394-1ddf-4212-8229-693ffb076b0b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Selenium Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/selenium-users/CALsF1KGFKMw_HWrGN2_7k0d5zBrAn9BcmJbTW%3DC0x1fEokGbBw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Scraping a widget

jk000ru
In reply to this post by jk000ru
So here is the latest code and it's printing results without the scroll down part of the code.  When I add the scroll down code, it scrolls to the bottom of the page but keeps trying to scroll down infinitely instead of ending. .  Can someone show me code to go to the bottom but then end the loop so that it will print.


from selenium import webdriver

url = 'http://www.tradingview.com/screener'
driver = webdriver.Firefox()
driver.get(url)

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")



# will give a list of all tickers
tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol')

# will give a list of all close values
close_values = driver.find_elements_by_xpath("//td[@class = 'tv-data-table__cell tv-screener-table__cell tv-screener-table__cell--numeric']/span")

# will give a list of all percentage changes
percentage_changes = driver.find_elements_by_xpath('//tbody/tr/td[3]')

# will give a list of all value changes
value_changes = driver.find_elements_by_xpath('//tbody/tr/td[4]')

# will give a list of all ranks
ranks = driver.find_elements_by_xpath('//tbody/tr/td[5]/span')

# will give a list of all volumes
volumes = driver.find_elements_by_xpath('//tbody/tr/td[6]')

# will give a list of all market caps
market_caps = driver.find_elements_by_xpath('//tbody/tr/td[7]')

# will give a list of all PEs
pes = driver.find_elements_by_xpath('//tbody/tr/td[8]')

# will give a list of all EPSs
epss = driver.find_elements_by_xpath('//tbody/tr/td[9]')

# will give a list of all EMPs
emps = driver.find_elements_by_xpath('//tbody/tr/td[10]')

# will give a list of all sectors
sectors = driver.find_elements_by_xpath('//tbody/tr/td[11]')

for index in range(len(tickers)):
   print("Row " + tickers[index].text + " " + close_values[index].text + " " + percentage_changes[index].text + " " + value_changes[index].text + " " + ranks[index].text + " " + volumes[index].text + " " + market_caps[index].text + " " + pes[index].text + " " + epss[index].text + " " + emps[index].text + " " + sectors[index].text + " ")



On Monday, June 18, 2018 at 1:04:57 PM UTC-4, [hidden email] wrote:
Hello,

I'm trying to scrape data from a widget. It was working for the first page but there was tons more data below that it wasn't scraping. So, next I added code to scroll to the end of the page so all the data could be scraped. Now, however
when it's finished scrolling to the end of the page, it just waits and never prints. Any idea how to get it to stop waiting and print? Eventually, I'd like to try to bring the data into excel if anyone knows how to do that too. Thanks


from
selenium import webdriver url = '<a href="http://www.tradingview.com/screener" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.tradingview.com%2Fscreener\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFIKOc0poUvVkg3dsvNtSGAAOX6fw&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.tradingview.com%2Fscreener\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFIKOc0poUvVkg3dsvNtSGAAOX6fw&#39;;return true;">http://www.tradingview.com/screener' driver = webdriver.Firefox() driver.get(url) SCROLL_PAUSE_TIME = 0.5 # Get scroll height last_height = driver.execute_script("return document.body.scrollHeight") while True: # Scroll down to bottom driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") # Wait to load page time.sleep(SCROLL_PAUSE_TIME) # Calculate new scroll height and compare with last scroll height new_height = driver.execute_script("return document.body.scrollHeight") if new_height == last_height: break last_height = new_height element = driver.find_element_by_id('js-screener-container') print (element.text)

--
You received this message because you are subscribed to the Google Groups "Selenium Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/selenium-users/d6769b60-1b72-4543-98b6-ce9246df2af1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.