|
|
|
Using Python to Sound Like a Wine Snob |
Si vous voulez bloquer ce service sur vos fils RSS
Si vous voulez nous contacter ou nous proposer un fil RSS
Menu > Articles de la revue de presse : - l'ensemble [ tous | francophone] - par mots clé [ tous] - par site [ tous] - le tagwall [ voir] - Top bi-hebdo de la revue de presse [ Voir]
Using Python to Sound Like a Wine Snob Par Zen OneLe [2015-10-02] à 06:27:57
Présentation : Web Scraping with Beautiful Soup Using Markov Chain to Create --------------------------------------------------------------- Wine Snob Gibberish Written by Steve Zenone on 2015-10-01 Inspired by Tony Fischetti's blog post, How to fake a sophisticated knowledge of wine with Markov Chains Some Background --------------- The term, Markov chain, is named after Russian mathematician, Andrey Markov 1871 - 1897 . In mathematics, a Markov Chain is a discrete random process with the Markov property. According to Wikipedia, A stochastic process has the Markov property if the conditional probability distribution of future states of the process conditional on both past and present states depends only upon the present state, not on the sequence of events that preceded it. This process changes randomly throughout each iteration in discrete steps. Saaaay what Jason Young has a video on YouTube that better explains what a Markov Chain is. Markon, Markov, Markon, Markov - Mr. Miyagi not really ---------------------------------------------------------- Our lives have all graced by Markov Chains. Think about those websites websites you've come across with nonsensical text. These pages are often generated by the use of the Markov Chain. Why do these pages exist They're used to optimize search engine rankings the darker side of SEO . Bummer. Fun with the Markov Chain ------------------------- While researching Markov Chains, I came across Tony's tonyfischetti blog post. It inspired me to create a Python script that emulates essentially what his example does, but with using the BeautifulSoup library to scrape the initial website content. Requirements ------------ beautifulsoup4 This is for extracting the data we want from the downloaded web content requests This is for downloading the web content Steps ----- Download initial webpage from winespectator.com and determine last page number The idea behind web scraping is to get raw content from a wbsite and extract from it usable data. This is where the python library, BeautifulSoup, comes in handy. Let's Begin ----------- We'll start by importing the modules we'll need to download a website and extract the data we want. import requests from bs4 import BeautifulSoup from random import choice We want to pick a random webpage from the website for to feed into our Markov Chain. First, let's find out how many pages this site has at http www.winespectator.com dailypicks category catid 1 page . Now we'll download the HTML source from a wine website and generate a Beautiful Soup object using the BeautifulSoup function. base_url http www.winespectator.com dailypicks category catid 1 page r requests . get base_url soup BeautifulSoup r . text Next, we'll take a look at a section of the website's HTML to figure out what element we want to extract in order to get the last page number. print soup . prettify 44750 45750 items -- 1 2 3 4 5 6 7 Last 814 86 points, 12 Light, firm tannins support a pleasingly plump texture in this fresh red, which offers black cherry, leaf and tobacco notes, with a smoky finish. Drink now through 2013. 50,000 cases imported. â Thomas Matthews Jan. 11, 2011 STANDING STONE Chardonnay Finger Lakes 2009 85 points, 11 Up front, with green apple, melon and butter hints. Just tangy enough on the finish to keep it all honest. Drink now. 1,184 cases made. â James Molesworth We can now begin looking at the text that Beautiful Soup Extracted. Each element section can be called by its index. verbiage 0 4070 Daily Wine Picks found in this category. We'll probably want to ignore verbiage 0 later. verbiage 1 Light, firm tannins support a pleasingly plump texture in this fresh red, which offers black cherry, leaf and tobacco notes, with a smoky finish. Drink now through 2013. 50,000 cases imported. â Thomas Matthews Nice - this looks like some content we want to extract. Ok, we know we don't want verbiage 0 , so we'll start iterating through the entries starting at index 1 i.e., 1 . We'll also encode the text to UTF-8. Next, we'll want to remove any newlines and tabs that are in the text ... then remove any leading trailing spaces ... and then split the line so that we ignore the em element we don't care about who wrote the comment on the website. We'll combine all of the sanitized text into a string called scraped_text. scraped_text for entry in verbiage 1 entry entry . get_text . encode 'utf-8' entry entry . replace ' t ' , '' entry entry . replace ' n ' , '' entry entry . strip entry entry . split 'â ' 0 scraped_text . format entry . replace 'Back to Top' , '' scraped_text str scraped_text . split 'Featured ' 0 Let's see what we got by printing scrapted_text. print scraped_text Light, firm tannins support a pleasingly plump texture in this fresh red, which offers black cherry, leaf and tobacco notes, with a smoky finish. Drink now through 2013. 50,000 cases imported. Up front, with green apple, melon and butter hints. Just tangy enough on the finish to keep it all honest. Drink now. 1,184 cases made. Tasty, showing citrus, pear and apple flavors that have a pleasant ripeness and a floral quality. Balanced and juicy. Drink now. 40,000 cases made. Vibrant and mouthwatering, with a laser beam of lemon, lime, grapefruit and apricot flavors. Hints of fresh herbs and flowers add to the complexity. Drink now. 250,000 cases imported. Syrah-like, with layers of plum, spice and violet flavors framed by a fine layer of tannins, followed by a focused, tar-tinged finish. Drink now. 60,000 cases made. Browse our exclusive lists of the world's top wine values, top value producers and easy-to-find wines. Yawn. I like wine. Well, I like to drink wine. At this point, we have the text we want to work with. Let's create the Markcov Chain and generate some new text. We'll define a function that splits text passed on to it into a dictionary of Markcov Chain chunks, returning the new dict once it's done. For example, take the sentance, I love walking cats in New York City . The sentance is first chunked into bi-grams I love love walking walking cats cats in in New New York York City With Python, we'll make these immutable keys in a dictionary dict 'I', 'love' '', 'love', 'walking' '', ... We'll then need to add values to each of the keys. The values will consist of the word that comes after each instance of the bi-grams. So, in the case of I love , the third word is walking . If we feed more data into our function, there may be multiple instances of I love . For example, I love walking cats in New York city. I love eating pizza. . The words walking and eating both come after I love there are two instances of I love . The value we assign to the 'I', 'love' dictionary key is a list consisting of 'walking', 'eating' Our dictionary begins to look like 'I', 'love' 'walking', 'eating' , ... Once completed, we return the dict. def create_markcov_dict original_text original_text original_text split_text original_text . split markcov_dict for i in xrange len split_text - 2 key_name split_text i , split_text i 1 key_value split_text i 2 if key_name in markcov_dict markcov_dict key_name . append key_value else markcov_dict key_name key_value return markcov_dict Let's send the above function our scraped text from the website. markcov_dict create_markcov_dict scraped_text print markcov_dict 'top', 'wine' 'values,' , 'lime,', 'grapefruit' 'and' , 'wine', 'values,' 'top' , 'Hints', 'of' 'fresh' , 'laser', 'beam' 'of' , 'add', 'to' 'the' , 'and', 'mouthwatering,' 'with' , 'which', 'offers' 'black' , 'green', 'apple,' 'melon' , 'violet', 'flavors' 'framed' , 'notes,', 'with' 'a' , 'value', 'producers' 'and' , 'tobacco', 'notes,' 'with' , 'imported.', 'Up' 'front,' , 'made.', 'Vibrant' 'and' , 'and', 'easy-to-find' 'wines.' , 'mouthwatering,', 'with' 'a' , 'tannins,', 'followed' 'by' , 'of', 'tannins,' 'followed' , 'Tasty,', 'showing' 'citrus,' , 'flavors', 'framed' 'by' , 'of', 'plum,' 'spice' , 'of', 'lemon,' 'lime,' , 'a', 'pleasingly' 'plump' , '40,000', 'cases' 'made.' , 'and', 'apple' 'flavors' , '250,000', 'cases' 'imported.' , 'values,', 'top' 'value' , '2013.', '50,000' 'cases' , 'flavors', 'that' 'have' , 'butter', 'hints.' 'Just' , 'ripeness', 'and' 'a' , 'lists', 'of' 'the' , 'and', 'butter' 'hints.' , 'of', 'the' world's , 'finish.', 'Drink' 'now', 'now.' , 'now.', '60,000' 'cases' , 'Drink', 'now.' '1,184', '40,000', '250,000', '60,000' , 'and', 'apricot' 'flavors.' , 'Syrah-like,', 'with' 'layers' , 'honest.', 'Drink' 'now.' , 'that', 'have' 'a' , 'front,', 'with' 'green' , 'fine', 'layer' 'of' , 'top', 'value' 'producers' , '1,184', 'cases' 'made.' , 'and', 'flowers' 'add' , 'all', 'honest.' 'Drink' , 'cases', 'imported.' 'Up', 'Syrah-like,' , 'apple,', 'melon' 'and' , 'Up', 'front,' 'with' , 'floral', 'quality.' 'Balanced' , 'texture', 'in' 'this' , 'the', 'complexity.' 'Drink' , 'plum,', 'spice' 'and' , 'to', 'the' 'complexity.' , 'now.', '40,000' 'cases' , 'a', 'fine' 'layer' , 'flavors.', 'Hints' 'of' , 'juicy.', 'Drink' 'now.' , 'fresh', 'herbs' 'and' , 'tar-tinged', 'finish.' 'Drink' , 'hints.', 'Just' 'tangy' , 'and', 'tobacco' 'notes,' , 'pleasingly', 'plump' 'texture' , 'framed', 'by' 'a' , 'Light,', 'firm' 'tannins' , 'now.', '1,184' 'cases' , 'of', 'fresh' 'herbs' , 'with', 'green' 'apple,' , 'grapefruit', 'and' 'apricot' , 'melon', 'and' 'butter' , 'have', 'a' 'pleasant' , 'leaf', 'and' 'tobacco' , 'cherry,', 'leaf' 'and' , 'beam', 'of' 'lemon,' , 'smoky', 'finish.' 'Drink' , 'red,', 'which' 'offers' , 'keep', 'it' 'all' , 'showing', 'citrus,' 'pear' , 'the', world's 'top' , 'offers', 'black' 'cherry,' , 'now', 'through' '2013.' , 'in', 'this' 'fresh' , 'now.', '250,000' 'cases' , 'complexity.', 'Drink' 'now.' , 'a', 'laser' 'beam' , 'made.', 'Tasty,' 'showing' , 'Balanced', 'and' 'juicy.' , '60,000', 'cases' 'made.' , 'our', 'exclusive' 'lists' , 'this', 'fresh' 'red,' , 'firm', 'tannins' 'support' , 'Drink', 'now' 'through' , 'flowers', 'add' 'to' , 'pleasant', 'ripeness' 'and' , 'imported.', 'Syrah-like,' 'with' , 'producers', 'and' 'easy-to-find' , 'Just', 'tangy' 'enough' , 'apple', 'flavors' 'that' , 'with', 'layers' 'of' , 'cases', 'made.' 'Tasty,', 'Vibrant', 'Browse' , 'focused,', 'tar-tinged' 'finish.' , 'enough', 'on' 'the' , 'to', 'keep' 'it' , 'followed', 'by' 'a' , 'pear', 'and' 'apple' , 'quality.', 'Balanced' 'and' , 'plump', 'texture' 'in' , 'a', 'pleasant' 'ripeness' , 'black', 'cherry,' 'leaf' , 'finish', 'to' 'keep' , 'Browse', 'our' 'exclusive' , 'it', 'all' 'honest.' , 'layer', 'of' 'tannins,' , 'on', 'the' 'finish' , 'exclusive', 'lists' 'of' , 'a', 'floral' 'quality.' , 'the', 'finish' 'to' , 'made.', 'Browse' 'our' , 'a', 'smoky' 'finish.' , 'with', 'a' 'smoky', 'laser' , 'through', '2013.' '50,000' , 'lemon,', 'lime,' 'grapefruit' , 'apricot', 'flavors.' 'Hints' , world's , 'top' 'wine' , 'and', 'violet' 'flavors' , 'Vibrant', 'and' 'mouthwatering,' , 'and', 'a' 'floral' , 'tangy', 'enough' 'on' , 'citrus,', 'pear' 'and' , 'fresh', 'red,' 'which' , '50,000', 'cases' 'imported.' , 'by', 'a' 'fine', 'focused,' , 'a', 'focused,' 'tar-tinged' , 'and', 'juicy.' 'Drink' , 'tannins', 'support' 'a' , 'layers', 'of' 'plum,' , 'support', 'a' 'pleasingly' , 'spice', 'and' 'violet' , 'herbs', 'and' 'flowers' We'll create a new function that we can feed this Markov'ian dictionary to and have the newly generated test we want returned. def create_markcov_text markcov_dict Pick a random starting point selected_words_tuple choice markcov_dict . keys Get the first three words for our new Markcov story poem text first_word selected_words_tuple 0 . title second_word selected_words_tuple 1 next_word choice markcov_dict selected_words_tuple Begin our Markcov test with the three new words from above markcov_text . format first_word , second_word , str next_word Grab the next tuple of two words using second_word, next_word from above selected_words_tuple second_word , next_word Generate the remainder of the Markcov text, ending the Markcov text when we create a key that doesn't exist while True if selected_words_tuple in markcov_dict next_word choice markcov_dict selected_words_tuple markcov_text . format str next_word selected_words_tuple selected_words_tuple 1 , next_word else break Return our newly generated Markcov poem story text return markcov_text We'll pass markcov_dict to the above function. markcov_text create_markcov_text markcov_dict Drumroll ... let's finally print our newly generated wine'snobbery text. print markcov_text Tobacco notes, with a laser beam of lemon, lime, grapefruit and apricot flavors. Hints of fresh herbs and flowers add to the complexity. Drink now. 250,000 cases imported. Up front, with green apple, melon and butter hints. Just tangy enough on the finish to keep it all honest. Drink now. 1,184 cases made. Tasty, showing citrus, pear and apple flavors that have a pleasant ripeness and a floral quality. Balanced and juicy. Drink now. 40,000 cases made. Vibrant and mouthwatering, with a smoky finish. Drink now. 250,000 cases imported. Up front, with green apple, melon and butter hints. Just tangy enough on the finish to keep it all honest. Drink now. 1,184 cases made. Tasty, showing citrus, pear and apple flavors that have a pleasant ripeness and a floral quality. Balanced and juicy. Drink now. 250,000 cases imported. Up front, with green apple, melon and butter hints. Just tangy enough on the finish to keep it all honest. Drink now. 1,184 cases made. Tasty, showing citrus, pear and apple flavors that have a pleasant ripeness and a floral quality. Balanced and juicy. Drink now. 250,000 cases imported. Up front, with green apple, melon and butter hints. Just tangy enough on the finish to keep it all honest. Drink now. 250,000 cases imported. Up front, with green apple, melon and butter hints. Just tangy enough on the finish to keep it all honest. Drink now. 1,184 cases made. Browse our exclusive lists of the world's top wine values, top value producers and easy-to-find wines. Cheers
Les mots clés de la revue de presse pour cet article : python Les videos sur SecuObs pour les mots clés : python Les éléments de la revue Twitter pour les mots clé : python
Les derniers articles du site "Zen One" :
- The Force Awakens ... on Twitter - Using Python to Sound Like a Wine Snob - Sync Oracle Calendar to Google Calendar iCal iPhone - Where The 'Bleep' Did My Identity Go - Retrieving a Stolen iPhone in Under 72 Hours - Koobface Analysis - DHS Cybersecurity Strategy and New California eCrime Unit - America the Vulnerable - New Reader Poll - CISSP Exam - The Pony in the Dung Heap Joke
Menu > Articles de la revue de presse : - l'ensemble [ tous | francophone] - par mots clé [ tous] - par site [ tous] - le tagwall [ voir] - Top bi-hebdo de la revue de presse [ Voir]
Si vous voulez bloquer ce service sur vos fils RSS :
- avec iptables "iptables -A INPUT -s 88.190.17.190 --dport 80 -j DROP"
- avec ipfw et wipfw "ipfw add deny from 88.190.17.190 to any 80"
- Nous contacter par mail
| Mini-Tagwall des articles publiés sur SecuObs : | | | | sécurité, exploit, windows, attaque, outil, microsoft, réseau, audit, metasploit, vulnérabilité, système, virus, internet, usbsploit, données, source, linux, protocol, présentation, scanne, réseaux, scanner, bluetooth, conférence, reverse, shell, meterpreter, vista, rootkit, détection, mobile, security, malicieux, engineering, téléphone, paquet, trames, https, noyau, utilisant, intel, wishmaster, google, sysun, libre |
| Mini-Tagwall de l'annuaire video : | | | | curit, security, biomet, metasploit, biometric, cking, password, windows, botnet, defcon, tutorial, crypt, xploit, exploit, lockpicking, linux, attack, wireshark, vmware, rootkit, conference, network, shmoocon, backtrack, virus, conficker, elcom, etter, elcomsoft, server, meterpreter, openvpn, ettercap, openbs, iphone, shell, openbsd, iptables, securitytube, deepsec, source, office, systm, openssh, radio |
| Mini-Tagwall des articles de la revue de presse : | | | | security, microsoft, windows, hacker, attack, network, vulnerability, google, exploit, malware, internet, remote, iphone, server, inject, patch, apple, twitter, mobile, virus, ebook, facebook, vulnérabilité, crypt, source, linux, password, intel, research, virtual, phish, access, tutorial, trojan, social, privacy, firefox, adobe, overflow, office, cisco, conficker, botnet, pirate, sécurité |
| Mini-Tagwall des Tweets de la revue Twitter : | | | | security, linux, botnet, attack, metasploit, cisco, defcon, phish, exploit, google, inject, server, firewall, network, twitter, vmware, windows, microsoft, compliance, vulnerability, python, engineering, source, kernel, crypt, social, overflow, nessus, crack, hacker, virus, iphone, patch, virtual, javascript, malware, conficker, pentest, research, email, password, adobe, apache, proxy, backtrack |
|
|
|
|
|