搜索
您的当前位置:首页正文

Inducing a cline from corpora of political manifestos

来源:二三娱乐
InducingaClinefromCorporaofPoliticalManifestos

SofieVanGijsel∗&CarlVogel†

Abstract

TechniquesfromcorpuslinguisticsareappliedtotheanalysisofanumberofEuropeanright-wingpartiesinanefforttoextendmethodsforrankingpartiesonaleft-rightspectrumwithinandacrosscountriesandlanguages.Focusisplacedonpartiesnotingovernment,andanalysisisderivedfromcorporaderivedfromelectionmanifestospublishedbythoseparties.Thetechniquesappliedareobjectiveinthattheyapplystatisticalmeasureswithconfidenceteststoobjectivelyquantifiablelinguisticfeaturesofthedocuments.Validapplicabilityofthetechniquesisdemonstrated.ThemethodsarethenusedtoestimatepairwisesimilarityofanumberofEuropeanpoliticalparties,includingcross-nationalcomparisons.

1Introduction

WereportonapplicationofrecentcorpuslinguisticmethodstotheanalysisofanumberofEu-ropeanright-wingpartymanifestos.Inrecentmanifestoresearchaimingatestimatingthepolicypositionsofgovernmentalpartiesofanationinanobjectiveway,computerizedapproacheshavebeenusedtolocatepartiesonaprioriestablishedpolicydimensions(seeanynumberofarti-clesinLaver,2001a).Wefocusontherelativelyunresearchedpolicyspaceof(often)small,non-governmental,right-wingpartiesofanumberofEuropeancountries.Anaimistoidentifyobjectivemeanstorankthesepartiesonapoliticalspectrumusingonlylimiteddataavailablefromsuchparties.Weuseaninductivemethodwhichtreatselectionmanifestosascorporatobeanalyzed.Thus,insteadofinideologicalterms,themanifestosarecomparedonthebasisoflinguisticallyquantifiablefeatures.Clearly,ideologicalissuesenterintheselectionofpartieswhosemanifestosareexamined,butbeyondthesepre-theoreticchoices,content-freestatisticaltechniquesareusedtorankthelevelofsimilaritybetweentheparties.

Aninitialmethodologicalquestionisindeterminingwhetheritislegitimatetoconsidertheright-wingmanifestos,allclearlybelongingtoonesubgenre,as‘corpora’whicharedistinguish-ableonthebasisofsignificantlinguisticdifferences.Asrecentresearchincorpuslinguisticsshows(Kilgarriff,2001),inordertovalidlycomparecorpora,theirinternalhomogeneityhastobelargerthanthedistancebetweenthem.Tomeasurethewithin-corpusdistances,weapplyre-centlyproposedauthorshipidentificationtechniques(AID),attemptingtoassignsubpartsofthemanifestoscorrectly.Thisway,itispossibletocross-validaterecentattributionalresearchwhichshowsthatsubstringsofwordsareexcellentauthordiscriminators.Weestablishtheinternalhomogeneityofthecorpora,prerequisiteformeasuringthesimilaritylevelsamongthem.

Asecondquestionisthenifacorpussimilaritymeasurecanbeappliedtoevaluatethedis-tancebetweenthedifferentparties,bothonanationalandacross-nationallevel.Therecently

χ

)givesproposedChibyDegreesofFreedomsimilaritymeasure(Kilgarriff(2001);hereafter,d.f.arankingwhichwewillattempttointerpretasanindicationofthepositionofthedifferentpartiesinacommonpolicyspace.Theresultssuggestencouragingpotentialfornewmethodsinanalyzingmanifestosinpoliticalscienceandotherfieldsinwhichtext-basedinductionofpartially-orderedposition-spacesisuseful.Wearguetheobjectiveanalysisofsmall,‘reallan-guage’setsoftextsascorpora,isaninteresting,albeitchallengingfieldofcorpuslinguistics.

QuantitativeLexicologyandVariationalLinguistics,KatholiekeUniversiteitLeuven,Belgium:Sofie.VanGijsel@arts.kuleuven.ac.be†

ComputationalLinguisticsGroup&CentreforComputingandLanguageStudies,TrinityCollege,U.ofDublin:vogel@tcd.ie

1

2ManifestoResearchinPoliticalScience

Manifestoanalysisisconsideredafruitfulwayofgaininginsightintothepositionsofpoliticalpartiesinonepolicyspace(Mair,2001;Laver,2001b).TheManifestoResearchGroupcollectsandanalyzespoliticalprogramsbywayofcomparativecontentanalysis,classifyingeach‘quasi-sentence’accordingtoacodingschemeof56categories,whichbelongtoaprioriestablisheddimensionsofthepolicyspace(e.g.economicalleft-right,socialliberal-conservative).Therationalebehindthissystemissaliencetheory(Budge,2001),whichstatesthatthesalienceofanissueinthemanifestoprovidesinformationaboutthepositionofthepartyonthatissue.Yet,thistheorycanbecriticized:forexample,immigrationwillbeahotissueformanyparties—especiallyfortheresearchedright-wingparties—butmentioningtheissueintheprogramdoesnotautomaticallypointatbeing‘infavor’or‘against’it.Thismethodalsorequiresalargeamountofhumancodingeffort,whichistimeandmoneyconsuming,withoutbeingcompletelyobjective.Therefore,recentmethodsanalyzethemanifestosinamorequantitativeway.

AfirstimprovementisthecomputerizedcontentanalysisproposedbyLaverandGarry(2000).Onthebasisoftworeferencetextsormanifestosofpartiesforwhichthepositiononanumberofpre-establishedpolicydimensionsisknownapriori,theresearchersmakeupakeywordlist,1whichwillthenbeusedtocodeothermanifestosor‘virgin’texts.Yet,thecompo-sitionofthekeyworddictionaryisnotonlytimeconsuming,butalso,thevalidityoftheanalysisishighlydependentonthekeywords,whicharesensitivetoboththesubstantiveandthetemporalcontextofthereferencemanifestos.2Therefore,Laver,Benoit,andGarry(2003)recentlyhaveproposedaprobabilisticdictionaryapproach,measuringtherelativefrequencyofallthewordsinthereferencestexts.Fortheanalysisof‘virgin’texts,thepolicypositionisthendeterminedonthebasisofthescoresforallthewordswhicharegivenacertainscoreonadimensionunderinvestigationonthebasisofthereferencetexts.Thismethodallowsrapidanalysisandreanalysisoflargequantitiesoftexts.Itisalsoapplicabletonon-Englishtexts,anadvantageifmanifestosarecomparedcross-nationally.Yet,thereliabilityisstillhighlydependentonthechoiceofrefer-encetexts.Positioningvirgintextsonaprioriestablisheddimensions,abstractedfromreferencetexts,mightbeagoodapproachforwell-researchedpolicyspaces,butfortheanalysisoftheoftensmallandnon-governmentalright-wingpartiesanalyzedinthisproject,thisisnotoptimal.Insteadofusingpre-establisheddimensions,weattempttoanalyzethemanifestosinductivelyintoapartial-ordering,treatingthecompletetextsascorpora(alsosensitivetotextchoice,butbecauseofthepartiesanalyzed,thisamountstoalloftheavailabletext,ratherthanchoice),thedistancesamongwhichcanbemeasured.Thedistancescanonlybeinterpretedaposteriori.3

AuthorshipIdentificationTechniques(AID)

Asexplained,theinternalhomogeneityofthemanifestoshastobeestablishedbeforeavalidcorpuslinguisticcomparisonispossible.WeuseanumberofAIDtechniquestoprovethatthewithin-corpusdistancesaresmallerthanthethosebetweenthemanifestos.First,ashortoverviewanddiscussionofAIDmethodsusedintheanalysisofstyleorstylometrywillbegiven.Oakes(1998)andHolmes(1998)(forexample)providemorecomprehensiveoverviews.Themethodsweadoptareoutlinedin§3.2;later,§4and§5detailouranalysis.

Everywordwhichoccursatleasttwiceasmanytimesintheright-ortheleft-wingreferencetextisclassifiedasaright-orleft-wingkeywordrespectively.2

SeeVanGijsel(2002,p.82-88)fortheimplementationofakeyworddictionaryforDutch,asdevisedbydeVries(1999).Theresultsshowthatfortheanalysisofright-wingpartymanifestosofBelgiumandFlanders(theDutch-speakingpartofBelgium),whichentailsacross-nationalandtemporalextension,thekeyworddictionarydoesnotgivevalidresults.

1

2

3.1OverviewoftheAIDtechniques

StylometryasanAIDtechniquedatesatleastto1851,whenthelogiciandeMorgansuggestedthattheauthenticityofsomelettersofStPaulmightbetestedcomparingthewordlength.Yule(1944)developedameasureofvocabularyrichness,K,basedontheprobabilitythatanyran-domlyselectedpairofwordsareidentical.Overtheyears,anumberofothervocabularyrichnessmeasuresasdiscriminatorshavebeenproposed,suchas,forexample,thetype-tokenratioortheproportionofuniquewordstothetotalsizeofthevocabularyused(e.gMorton,1986),althoughmorerecentresearch(e.g.Holmes,1998)showsthatthesetechniquesarenotreliable,beinghighlydependentonthechoiceandlengthofthetextsunderanalysis.MostellerandWallace(1964)famouslyattributedofthepurposelyanonymous,disputedFederalistPaperstoMadi-soninsteadofHamilton,onthebasisofaprobabilisticanalysisofthemostfrequentwords.Theseweremainlyfunctionwords,whichareratherunconsciousandthereforeeffectivemark-ersofauthorship.Whilemostmeasurestakethelexicalitem(orpre-terminallexicalcategoriesasparts-of-speech)astheunitofanalysis,somerecentmethodsfocusonsublexicalunits,es-peciallyletteruni-andbigrams.Withoutrequiringsyntacticorlexicalanalysis,theseelementsareeasilyandobjectivelyquantifiable,whilebeingusefulfortextsofvaryingandlimitedlength(e.g.Forsyth(1997),KhmelevandTweedie(2001),Chaski(1998)).

Inliterarystylistics,theCusumtechnique(Farringdon,1996)hasbeendeveloped;itgraph-icallyplotstheaveragesentencelengthofanauthor’ssample,superimposedbyplotsforthefrequencyofaselected‘linguistichabits’oftheauthor,suchastheuseoftwoandthreeletterwords.Thetechniquehasbeencriticizedforbeinglaborintensiveandhighlysubjective,e.g.withregardtothechoiceofthe(limited)numberofsentencesanalyzed,choiceofselectedlin-guistichabitsandtheinterpretationoftheplots(Canter,1992;Chaski,1998).Foster’s(2001)analysisofthe‘literaryDNA’ofawriterisakintotheCusummethodandcansimilarlybecriticizedforbeingsubjectiveandunscientific.Fosterclaimstouncoverauthorshiponthebasisof‘external’(e.g.thehistoricalbackgroundofawriter)and‘internal’evidence(e.g.charac-teristicssuchaspunctuationhabits),buthisrecentincorrectattributionof‘AFuneralElegy’toShakespeareinsteadoftoJohnFordrevealsthemethodologicalunreliabilityofhismethod.3.2

TheAIDtechniquesimplemented

WehaveexploredAIDmethodsavailingofletterunigramsandbigrams,sincetheycouldbeap-pliedcross-linguistically,andwithoutsubjectivecontentbasedjudgements,tosmall,unequally-sizedtexts.Thus,letterunigramsandletterbigramswerecounted.Further,wordunigramswerecounted,totestifsubstringsgivebetterresultsthanwordcounts.

McCombe(2002)soughtcross-validationofanumberofAIDtechniquesandconfirmedre-centwork(e.g.Chaski,1998)inthatletteruni-andbigramsperformremarkablybetterthan,inthatorder,wordunigramfrequency,syntactictagging,highern-gramsorkeywordsasmet-ricbasesforpredictingauthorshipofdisputedtexts.WeusedMcCombe’ssoftwaretotestthevalidityofdifferentAIDmethodsinassigningarbitrarilyselectedsubpartsofthemanifestostothecorrectparty.Fordetailedanduser-orienteddescriptionsofitsfunctionalityseeMcCombe(2002)orVanGijsel(2002).Theprogramtakesaninputfileconsistingofnamesofplaintextfiles,labeledtoencodeoneormoreuncontestedcategories,orasfilestobecategorized.Giveninputparameters(e.g.lettervs.wordn-gramanalysis,thevalueofnton-gram,etc.),thetextsareconcordancedandfrequencyanalyzed.Theprogram’soutputisapairwiseranking,giving

χ3

.Here,threethesimilarityofthevariouscorporainreversemagnitude,ascalculatedbyd.f.rankingsaregiven(letteruni-andbigramsandwordunigrams),constitutingaranklist.

χ

Thed.f.measureinsteadofsimplyχ2isused,sincethistakesintoaccountboththeχ2valueandthefrequencyinformationofthecorpora.Thisisusefulfornaturallanguagecorpora,likethemanifestos,whichareinherentlynon-randomlydistributed(Kilgarriff&Salkie,1996)3

3

Thisranklististheinputfortwostatisticaltests,whichcompareresultsofthetestsasrunwitharangeofparametervalues.First,theratiobetweentheaverageofthesimilarityscoresforallthepairsofcorporainthesameuncontestedcategoryandtheaveragesimilarityscoresforallthepairsofcorporaindistinctuncontestedcategories.4Thelargertheratio,themoresuggestivethemeasureis.McCombe(2002,p.37)noticesthattherankingoftheassignmentscoresisoftenamoredirectindicationoftheattributionalresult.AsecondtestistheMann-Whitneytest(alsocalledtheWilcoxonranksumstest;seeOakes,1998),5whichgivesanoverallsignificancemeasureforeachofthethreemethods,whilealsooutputtingamoredetailedlistofsignificancemeasuresforeachofthethreemethods,showingtheprobabilityoftheassignmentofeachoftheanonymouslycodedtextstothedifferentauthors.4AnalysisoftheManifestosUsingAIDTechniques4.1DataCollection

Themanifestoswerecollectedbydownloadingthetextsfromtheirrespectivepartywebsites.Tokeepthehumaninterventiontoaminimum,the(thematic)subpartsofthewebsiteswerekeptintactasseparatefilesofcomparablesize,butthenumberofthemesbypartyvaried.Inthispaper,theanalysisoftheDutchlanguagemanifestosisaddressed,bothonanationalandacross-nationallevel.ForTheNetherlandsweanalyzedthemanifestosofthepartiesLijstPimFortuyn(ListPimFortuyn,LPF)andLeefbaarNederland(LiveableNetherlands,LN).TheLPF-manifestoconsistsofasingletextofalittleunder4,000words,whiletheLN-textcontains10subparts(justover10,000wordsintotal).TheBelgianpartymanifestooftheVlaamsBlok(FlemishBlock,VB)wasdownloadedin13chunks,amountingtomorethan20,000words.64.2

AnalysisoftheManifestosinOneNation

WeanalyzeLPFandLNaswithin-NetherlandsDutch-languageparties.Distinguishingthetwopartiesisapotentiallydifficulttask,sincetheyoriginallyformedoneparty,thepopulistpartyLN,foundedinJune2001,withPimFortuynaspartyleader.Afterbeingoustedforblatantanti-Muslimcomments,Fortuynlaunchedhisownnationalparty,LPF.WhileitisoftenclaimedthatLNisapopulistratherthananextremerightparty,LPFcanbeexpectedtobeslightlymoreright-wing(Buyse,2002).Yet,Fortuynwasopenlyhomosexualandadvocatedliberalsocialvalues,whichareverydifferentfromtraditionalright-wingvalues.InordertocheckiftheAIDmethodscoulddistinguishbetweenthetwomanifestos,asubpartoftheLN-manifestowascoded‘anonymous’,whiletheothersubparts(i.e.theother9LN-subpartsandtheLPF-part)weregivenanarbitrarycode(i.e.lforLNandpforLPF).ThetaskgiventotheprogramistoassignthesubparttoLNinsteadoftoLPF,usingAIDmethods.

Weconcordancedthemanifestosubpartsusingletterunigrams,letterbigramsandworduni-grams.Then,boththesimilarityratioandMann-Whitneywerecalculated.

Letterunigrams1.299

ln4fitsincategorylln4fitsincategorypp<0.0005Letterbigrams1.131

ln4fitsincategorylln4fitsincategorypp<0.025Wordunigrams1.028

ln4fitsincategorylln4fitsincategorypp<0.25RatioRankingMann-WhitneyTable1:ResultsofAID-testsclassificationof‘anonymous’subtextln4toLN(l)vs.LPF(p)

Inauthorshipattributions,thecategorycorrespondstoauthoridentity.

ThisisinspiredbytheproposalofKilgarriff(1996)forequally-sizedsubcorpora.6

Forthecorpusanalysisinonenation(§4.2)andinonelanguage(§4.3),repeatedtestsforseveralsubparts,foracommunistpartyandformanifestosofGermany,AustriaandGreat-Britaingavesimilarresults(VanGijsel,2002).

54

4

TherankingindicatesthatallthreetestscorrectlyattributethesubparttothecorrectmanifestoofLN,withahigherratiomeasureforletterunigrams,followedbyletterbigrams,indicatingthatletterunigramsperformbest.Similarly,theoutputofMann-Whitneyshowsthattheattributionishighlysignificantforletterunigrams(p<0.0005),7whileletterbigramsarealsosignificant(p<0.025).Bythistest,awordunigramcountisnotsignificant(p<0.25).Theseresultscross-validaterecentAIDwork,specificallyMcCombe’s(2002)results.Moreimportantly,theconsistentcorrectattributionpointsattheinternalhomogeneityofthemanifestos,whichcanthereforebeconsideredfully-fledgedcorpora.4.3

AnalysisoftheManifestosinOneLanguage

Sinceweintendedtocompareright-wingpartiescross-nationally,inasecondstep,themani-festosinonelanguagewereanalyzedsimilarly.SupplementingthemanifestosofTheNether-lands,themanifestoofthetraditionallyfascistFlemishpartyVB(FlemishBlock)isanalyzed.TheattributionalresultsforsubpartsoftheVB-manifestoareconsistent,validatingitasacorpus.TofurtherverifyiftheAIDmethodsarerobustenoughtocopewiththeinterferenceofinher-entlynation-andcontext-dependentelementsinthemanifestos,acommunistmanifesto,oftheFlemishpartyPvdA(PartijvandeArbeid/LabourParty),wasincludedasadummy.AlthoughtheattributionalresultsforsubpartsofPvdAarenotsignificant,repeatedtestsmakeclearthatintheoutputranking,thecommunistpartyisnotcloserrelatedtotheotherFlemishparty,VB,thantotheDutchparties,suggestingthatacross-nationalextensionoftheright-wingmanifestoanal-ysisisviable.Thus,internalhomogeneityofthemanifestosonacross-nationallevelshowsthewithin-corpusdifferencestobesmallerthanbetween-corporadifferenceslegitimatingmeasure-mentofdistancesamongtherepresentativecorpora,sothatinanextstep,thesimilarityamongthecorporacouldbemeasuredasanindicationoftheparties’politicalandideologicalpositions.5

PlacingRight-WingPartiesinaLeft-RightSpectrum

Inthissectionwediscussthedistancebetweenthepartiesasmeasuredbytreatingthemani-festosintheirentiretyascorpora.Ingeneral,statisticalmethodstoreliablymeasurethedistancebetweensmall,unequallysizedcorporaarescarce,Kilgarriff(2001)proposedχ2asa‘singlemeasure’ofdistancebetweeninternallyhomogeneouscorpora.Thepairwisesimilarityranking

χ

,isinterpretedasindicatingthelevelofsimilarityamongthemanifestos.Here,thebasedond.f.

manifestosintheirentiretyarecompared,byletterunigrams,whichconsistentlyemergedastheclearestmethodtodistinguishbetweenthem.Althoughtheanalysiswouldclearlybenefitfromabettersimilaritymeasure,enablingthedirectstatisticalcomparisonofanumberofcorporacross-linguistically,thismeasurewillbeinterpretedasindicatingthedistancesamongthetexts.

χd.f.LN-VBLPF-VBLN-LPF15.618.182.97p-valuep<0.0001p<0.001p>0.05Table2:ResultsoftheinversesimilarityrankingoftheDutchparties

TheinversesimilarityrankingshowsthatthedifferencebetweenLPFandLNisnotsig-nificant,onthebasisofletterunigramfrequencies.ThedifferencebetweenLPFandVBwassignificantata0.001levelandbetweenLNandVBevenata0.0001level.Thesefigurestiein

7

Notethatpmeasurestheprobabilitythatthesimilarityjudgementisduetomerechance.

5

withbackgroundknowledge:LNandLPFarebothpopulist,‘newstyle’rightwingparties,com-biningstronganti-immigrationviewswithliberalsocialvalues,whileVBisatraditionalfascistparty.Further,aswassaidbefore,LPFismoreright-wingthanLN,whichisalsoclearfromthehighersimilarityscoreforLPFwithVB.8

SimilaranalysiswascarriedoutforthemanifestostranslatedinEnglish(VanGijsel,2002,pp.93-95),9enablingextensionofthecross-nationalanalysis.Again,theinversesimilarityrankingbringsoutthedifferencebetween‘traditional’right-wingparties,likeforexampleBNP,andthe

¨,whicheclecticallycombinestronganti-morepopulist,new-styleright-wingparties,likeFPO

immigrationviewswithliberalsocialvalues;adifferencewhichcouldnotbetakenintoaccountwithanapriorianalysistryingtopositionthepartiesonpre-establishedpolicydimensions.106

Conclusion

WehavedescribedourattemptstolocateanumberofEuropeanright-wingpartiesinsinglecline,analyzingtheirmanifestosusingtoolsofcorpuslinguistics.Toverifyapplicabilityofcor-pustechniques,weappliedAIDmethodstoestablishthatintra-categorydifferencesaresmallerthaninter-categorydistancesamongthetexts.ThisconfirmedagainthatAIDmethodsusingletterfrequenciesarehighlyreliable,andverifiestheinternalhomogeneityofthemanifestosas

χ

corpora.Theresults,whichshowthatthemanifestoanalysisasmeasuredbyd.f.differentiates‘traditional’and‘new-style’right-wingparties,demonstratethatafullycomputerizedanalysis(specificallylackingcontentanalysis)cangiveinsightintherelativelyunresearchedpolicyspaceofright-wingparties.However,theanalysiscouldbenefitfrommethodologicalimprovementsandacross-linguisticextensionofthestatisticalmeasure.Thisworkillustratestheuseandlimitsofautomatedcorpuslinguistictechniquesforsmall,unequally-sized‘reallanguage’datasets.

References

Budge,I.(2001).ValidatingtheMRGapproach.InLaver,M.(Ed.),EstimatingthePolicyPositionof

PoliticalActors,pp.3–9.London:Routledge/ECPRStudiesinEuropeanPoliticalScience.Buyse,A.(Ed.).(2002).NieuwRadicaalRechtsinEuropa.Antwerpen/Amsterdam:Houtekiet.

Canter,D.(1992).AnEvaluationofthe“Cusum”StylisticAnalysisofConfessions.ExpertEvidence,

1(3),93–99.

Chaski,C.(1998).ADaubert-InspiredAssessmentofCurrentTechniquesforLanguage-BasedAuthor

Identification.InILETechnicalReport1098,pp.97–148.

deVries,M.(1999).GoverningwithYourClosestNeighbour:AnAssessmentofSpatialCoalitionFor-mationtheories.Ph.D.thesis,UBNijmegen.

Farringdon,J.M.(1996).AnalysingforAuthorship.Cardiff:UniversityofWalesPress.Withcontributions

byMorton,A.Q.,M.G.FarringdonandM.D.Baker.

Forsyth,R.S.(1997).ShortSubstringsasDocumentDiscriminators:AnEmpiricalStudy.Paperpresented

atACH-ALLC’97.

Foster,D.(2001).AuthorUnknown.OnthetrailofAnonymous.Macmillan:London,Basingstokeand

Oxford.

Holmes,D.I.(1998).TheEvolutionofStylometryinHumanitiesScholarship.LiteraryandLinguistic

Computing,13(3),111–117.

Khmelev,D.&Tweedie,F.J.(2001).UsingMarkovChainsforIdentificationofWriters.Literaryand

LinguisticComputing,16(3),299–308.

Settingasidepoliticalcontentof‘left’or‘right’andtaking∼torepresentsimilarity;<,strictdifference,wecanmakethefollowinginferencefrompairwisecomparisons:LPF∼LN,LPFThemanifestosoftheEnglishpartiesNF(NationalFront)andBNP(BritishNationalParty),oftheAustrian¨FPO(FreedomParty),theFrenchFN(FrontNational/NationalFront)andofLPFarenotreliablydiscriminable.10

Itcanberemarkedthatcross-linguistically,onlyimpressionisticconclusionsarepossible,linkinganon-EnglishmanifestosuchasforexampletheLN-text,whichisknowntobe‘populist’andcloselyrelatedtoLPF,ratherto¨,whichisclosertoLPF,thantoBNP.FPO

8

6

Kilgarriff,A.&Salkie,R.(1996).CorpusSimilarityandHomogeneityviaWordFrequency.InProceed-ingsofEuralex96.

Kilgarriff,A.(1996).Whichwordsareparticularlycharacteristicofatext?Asurveyofstatisticalap-proaches..InLanguageEngineeringforDocumentAnalysisandRecognition.Proceedings,AISBWorkshop,Falmer,Sussex.

Kilgarriff,A.(2001).ComparingCorpora.InternationalJournalofCorpusLinguistics,6(1),97–133.Laver,M.(Ed.).(2001a).EstimatingthePolicyPositionofPoliticalActors.Routledge.

Laver,M.(2001b).PositionandSalienceinthePoliciesofPoliticalActors.InLaver,M.(Ed.),Estimating

thePolicyPositionofPoliticalActors,pp.66–75.London:Routledge/ECPRStudiesinEuropeanPoliticalScience.

Laver,M.,Benoit,K.,&Garry,J.(2003).ExtractingPolicyPositionsfromPoliticalTextsUsingWords

asData.AmericanPoliticalScienceReview,97.

Laver,M.&Garry,J.(2000).EstimatingPolicyPositionsfromPoliticalTexts.AmericanJournalof

PoliticalScience,44(3),619–634.

Mair,P.(2001).SearchingforPositionsofPoliticalActors.InLaver,M.(Ed.),EstimatingthePolicy

PositionofPoliticalActors,pp.33–49.London;Routledge/ECPRStudiesinEuropeanPoliticalScience.

McCombe,N.(2002).MethodsofAuthorIdentification.B.A.(Mod)CSLLFinalYearProject,TCD.Morton,A.Q.(1986).Once.ATestofAuthorshipBasedonWordsWhichAreNotrepeatedinthe

Sample.LiteraryandLinguisticComputing,1(1),1–8.

Mosteller,F.&Wallace,D.(1964).AppliedBayesianandClassicalInference:TheCaseoftheFederalist

Papers.Reading:Addison-Wesley.

Oakes,M.P.(1998).StatisticsforCorpusLinguistics.EdinburghTextbooksinEmpiricalLinguistics.

Edinburgh:EdinburghUniversityPress.

VanGijsel,S.(2002).ACorpusLinguisticAnalysisofEuropeanRight-WingPartyManifestos.Master’s

thesis,CentreforLanguageandCommunicationStudies,TrinityCollege,UniversityofDublin.Yule,G.(1944).TheStatisticalStudyofLiteraryVocabulary.Cambridge:CambridgeUniversityPress.

7

因篇幅问题不能全部显示,请点此查看更多更全内容

Top